4.8 grep and regular expressions |
Regular expressions are used in many UNIX tools. The string
substitution used in grep
, ex
, and vi
use regular expressions.
The shell, in its file name expansion, uses a simplified regular
expression format. awk
and egrep
use extended regular expressions.
grep
is a program used to search a file, or a group of files, for
a specific regular expression. It does wonders in finding lost
functions and subroutines in a group of program sources, or
checking to see if you have removed all references to an obsolete
function or variable. An option of -n
will cause the line number
to be printed.
wc
is a program used to count lines, words, and characters, usually
all three. Its options are -l
, -w
and
-c
, for lines, words and
characters. The default is -lwc
.
In this lab you will search, using the program grep. an old
"C" program of mine named cman6.c
. It is a Mandelbrot set
program written for the Borland "C" compiler and some graphics
library additions. The file is avaliable at:
http://uml.lt.tucson.az.us/hl2.2007-fall/files/cman6.c.
Pipes are the use of the "|
" character to direct the output
of one command into the input of another command. For example:
grep 'define' ~cis137/cman6.c | wc -l
Will extract all lines from ~cis137/cman6.c
that have the string define
and send them onto wc
.
wc
will then count the number of lines.
grep
also has a -c
option that will count the
lines that match.
Another interesting option to grep
is -v
. This
will cause grep
to invert the condition. If the regular
expression matches a line it will NOT be output. However,
all the lines that didn't match will be output instead.
The shell, when it is processing a command line, first looks for pipe and redirection symbols. If it finds these it effectively removes these from the command line. The individual commands are unaware that they ever existed. Then it searches for and replaces variables and file names with wild cards. Afterwards it executes the individual commands.
Write a shell script called greplab that searches ~cis137/cman6.c for the various items below.
Use the echo
command to print out your name and TABER CIS137.
For each item also provide a short description of what you are printing out.
Start your shell script with #!/bin/bash
to use the bash
shell.
grep
to find all lines that have the substring
"closegraph" within them. Print out line numbers for the lines that
contain the substring along with the complete content of the line
Look at the grep
manual pages for the option -n
.
grep
manual pages for the option -c
.
>
". Be
careful ">
" is a shell redirection character.
Make sure that the character makes it to grep
by
quoting the regular expression.
When running grep
it is best to always protect the
regular expression from being interpreted by the shell
bu placing it within a pair of single quotes. Also remember to
escape all special character that need to retain their normal
meaning within the regular expression.
Turn in a copy of your shell script and its results. Make a printout of your output and shell script, and mark it with:
your name TABER CIS137 Lab 4.8: grep & regular expressions
Please turn your lab to Louis Taber or to Pima Community College employee in room A-115 of the Santa Rita Building. Ask them to place it in the dark blue folder in Louis Taber's mailbox.
See the sh(1)
& csh(1)
manual
pages for a complete description.
*
Matches any string including a null string.
?
Matches any single character.
[ ]
Matches any enclosed character.
A range of characters can be specified with a "-". [a-d]
==
[abcd]
If the first character following a "[
" is a "^
" then any
character NOT enclosed is matched. bash
also lets you use !
to
negate the list.
/
"must be matched explicitly.
sh
& bash
).
*
" & "?
" to look for
"*
" & "?
".
{
i1,
i2,
...}
" expands
list - bash
, csh
, and tcsh
.
See the ed
(1) manual pages for a complete description.
.
Is a one character regular
expression that matches any character.
*
Matches 0 or more of the preceding one character
regular expression.
[ ]
" matches any enclosed character.
A range can be specified
with a "-
". [a-d]
== [abcd]
. If the first character
following a "[
" is a
"^
"
then any character not
enclosed is matched.
^
at the beginning of the regular expression
forces the regular expression to match at the beginning of a line.
$
at the end of the regular expression forces
the regular expression to match the final segment
of a line.
*
^
$
[
]
.
See the egrep
(1) manual pages for
a complete description. egrep
can run up to 10 times faster
than grep
. Its memory usage is less predictable. awk
also
uses extended regular expressions.
+
Matches 1 or more of the preceding regular expression.
?
Matches 0 or 1 of the preceding regular expression.
|
Between two regular expressions |
will match
if either expression matches.
( )
Expressions may be enclosed in parentheses for grouping.
Instructor: Louis Taber, louis.taber.at.pima at gmail dot com (520) 206-6850
My web site in Cleveland, OH
The Pima Community College web site
4.8 grep and regular expressions |