4.9 Shell script -- dir4 Labs4.7 Editor 3 - emacs4.8 grep and regular expressions

4.8 grep and regular expressions

Regular expressions are used in many UNIX tools. The string substitution used in grep, ex, and vi use regular expressions. The shell, in its file name expansion, uses a simplified regular expression format. awk and egrep use extended regular expressions.

grep is a program used to search a file, or a group of files, for a specific regular expression. It does wonders in finding lost functions and subroutines in a group of program sources, or checking to see if you have removed all references to an obsolete function or variable. An option of -n will cause the line number to be printed.

wc is a program used to count lines, words, and characters, usually all three. Its options are -l, -w and -c, for lines, words and characters. The default is -lwc.

In this lab you will search, using the program grep. an old "C" program of mine named cman6.c. It is a Mandelbrot set program written for the Borland "C" compiler and some graphics library additions. The file cman6.c is available by anonymous ftp at ftp://lt.tucson.az.us/pub/ltaber/cman6.c.

Pipes are the use of the "|" character to direct the output of one command into the input of another command. For example:

grep 'define' ~csc137/cman6.c | wc -l
Will extract all lines from ~csc137/cman6.c that have the string define and send them onto wc. wc will then count the number of lines. grep also has a -c option that will count the lines that match.

Another interesting option to grep is -v. This will cause grep to invert the condition. If the regular expression matches a line it will NOT be output. However, all the lines that didn't match will be output instead.

The shell, when it is processing a command line, first looks for pipe and redirection symbols. If it finds these it effectively removes these from the command line. The individual commands are unaware that they ever existed. Then it searches for and replaces variables and file names with wild cards. Afterwards it executes the individual commands.

Write a shell script called greplab that searches ~csc137/cman6.c for the various items below.

Use the echo command to print out your name and TABER CIS137. For each item also provide a short description of what you are printing out.

Start your shell script with #!/bin/tcsh to use the tcsh shell.

  1. The line number and the line of all lines that have the
    string "closegraph".
  2. The number of lines that have the string "include".
  3. The number of lines with a period ".". (Be careful "." is a meta character.)
  4. The number of lines with a greater than symbol ">". (Be careful ">" is a shell redirection character.)
  5. The line number and the line where the string "main(" is. This function is where a "C" program starts running.
  6. The number of lines that have the string "colormap".
  7. The number of lines that have the string "/*". (Be careful "*" is a meta character.)
Watch out for the characters being interpreted by the shell with the use of single quotes. Escape special regular expression characters.

Turn in a copy of your shell script and its results. Make a printout of your output and shell script, and mark it with:

your name
TABER CIS137
Lab 4.8: grep & regular expressions
Place the lab in the instructor hand-in box in BUS R6E, the "terminal room".

4.8.1 sh and csh rules

See the sh(1) & csh(1) manual pages for a complete description.

4.8.2 regular expression rules

See the ed(1) manual pages for a complete description.

4.8.3 extended regular expression rules

See the egrep(1) manual pages for a complete description. egrep can run up to 10 times faster than grep. Its memory usage is less predictable. awk also uses extended regular expressions.
Instructor: ltaber@pima.edu ** My new Home at GeoApps in Tucson ** The Pima College Site ** The Mad Dr. G.'s home page on phred.

4.9 Shell script -- dir4 Labs4.7 Editor 3 - emacs4.8 grep and regular expressions