5.15 perl - A programming Language5 Labs5.13 sed - A Stream Editor5.14 awk -- A Programming Language

5.14 awk -- A Programming Language

  awk, (from the initials of Alfred

V. Aho, Peter W. Weinberger, and Brian W. Kernighan) is an interpretive programming language for processing text files. Of all 6 possible combinations of the three initials, awk seems most appropriate. At times the language seems awkward (to say the least). It is also very powerful for text processing. It has many of the same features as "C".

The AWK language is a good text-processing language. It has more features and capabilities than sed, but less than Perl.

The book: The AWK programming Language is a good awk reference. It was the source for parts of this lab.

Also look at: UNIX System V Release 3 Programmers Guide, Chapter 4 AWK.

The program can be placed on the command line or in a file. The program can read and write files named within the program. You can also pass arguments to the program from the command line.

Invoking awk one of two ways, the first places the program on the command line, the second places the awk program in a file.

awk 'program text' [file...] 
awk -f program-file [file...] 

An example of a two statement awk program on the command line follows. The program prints all lines with either "Arizona" or "72".

username@gort ~ $ awk '/Arizona/;/72/' /home/cis137/data 

Many "useful" awk programs are only 1 to 4 lines long! It may be one of the easiest ways to rearrange the order of fields in a file or do a summary report. Some examples of short awk programs:

  1. Print the number of input lines:
    END { print NR }
    
    This could have been done with wc
    wc -l
    
  2. Print the 40th line:
    NR == 40
    
  3. Print the lines 40-50:
    NR >= 40 && NR <= 50
    
  4. Print the last field of every line:
    { print $NF }
    
    NF is the number of fields. $NF is the last field in the record.
  5. Print the last field of the last line:
            { field = $NF }
    END     { print field  }
    
    field is a variable.
  6. Print lines with more than 4 fields:
    NF > 4
    
  7. Print the word count for the file:
           { words = words + NF }
    END    { print words        }
    
    This could have been done with wc
    wc -w
    
  8. Print lines with "Alaska":
    /Alaska/
    
  9. Count the lines with "Alaska":
    /Alaska/    { count++      }
    END         { print count  }
    
  10. Find the largest third field:
    BEGIN       { max = -1000000  }
    $3 > max    { max = $3        }
    END         { print max  }
    
  11. Find the largest third field and save the line, too:
    BEGIN       { max = -1000000           }
    $3 > max    { max = $3; savedline = $0 }
    END         { print savedline          }
    
  12. Print every line that has less than 4 fields:
    NF < 4  
    
  13. Print every line shorter than 50 characters:
    length < 50  
    
  14. Print the fields in reverse order:
    { for ( j=NF; j>0; j-- ) printf("%s ", $j) 
      printf("\n")
    }
    
  15. Total up all of the fields in a file:
        { for ( j=1; j<=NF; j++ ) total += $j }
    END { print( total ) }
    
  16. Renumber a file:
           { $1 = "" ; print NR, $0 }  
    
    The "$1 = """ removes the old line number. To renumber by tens:
           { $1 = "" ; print 10*NR, $0 }  
    

The format of the awk program is:

 
pattern  { action } 
pattern  { action } 
pattern  { action } 

This can be repeated many times in an awk program.

awk uses egrep extended regular expressions.

Two special patterns exist: BEGIN and END.

The action for BEGIN is processed prior to the first line of the input. This can be used for printing headers and setting field and record separators. The action for END is processed at the end of the input. This can be used for report summaries.

In a given record, the input fields are referred to as $1, $2, $3 ...

The entire record is referred to as $0.

 
Built-in variables

awk - Built-in variables
Variable Meaning Default
ARGC Number of command line arguments -
ARGV Array of command line arguments -
FILENAME Name of current input file -
FNR Record number of current file -
FS Input field separator " "
NF Number of fields in current record -
NR Number of records read -
OFMT Output format for numbers "%.6g"
OFS Output field separator " "
ORS Output record separator "\n"
RLENGTH Length of string matched with function match -
RS Input record separator "\n"
RSTART Start of string matched with function match -
SUBSEP Subscript separator "\034"

awk - Arithmetic Functions
atan2(y,x) arctangent of y/x in the range of -pi to pi
cos(x) cosine of x (x in radians)
exp(x) e</sup>x
int(x) integer part of x
log(x) logarithm base e of x
rand() random number from 0 to 1 ( 0 <= rand() < 1 )
sin(x) sine of x (x in radians)
sqrt(x) square root of x
srand(x) x is seed for rand()

awk - String Functions
gsub(r,s) Globally substitute s for r in $0
  returns number of substitutions
gsub(r,s,t) Globally substitute s for r in t
  returns number of substitutions
index(s,t) Return first position of t in s
  returns number of substitutions
length(s) Returns length of s
match(s,re) Test s for regular expression re
  Returns index or 0
  Sets RSTART and RLENGTH
split(s,a) spilt s into array a on FS
Returns number of fields returned
split(s,a,fs) spilt s into array a on fs
Returns number of fields returned
sprintf(format,list) return list formated by format
sub(r,s) Substitute s for longest
  left-most substring in $0
sub(r,s,t) Substitute s for longest
  left-most substring in t
substr(s,p) Return substring of s starting at p to end
substr(s,p,n) Return substring of s starting at p of length n

awk - Operators
Arithmetic Operators
+ Addition
- Subtraction
* Multiplication
/ Division
% Remainder
^ Exponentiation
Assignment Operators
= Assignment
+= Addition and assignment
-= Subtraction and assignment
*= Multiplication and assignment
/= Division and assignment
%= Remainder and assignment
^= Exponentiation and assignment
Increment & Decrement Operators
++ increment (prefix & postfix)
- decrement (prefix & postfix)

awk - Operators
Relational Operators
< Less than
<= Less than or equal
== Equal
!= Not equal
>= Greater than or equal
> Greater than
~ Does the string contain the re
!~ Does the string not contain the re
Logical Operators
|| Logical OR
&& Logical AND
! Logical NOT

awk - Flow of control
if(expression) statement
if(expression) statement1 else statement2
while( expression) statement
for(expression1;expression2;expression3)statement
do statement while( expression )
break
continue
next
exit
exit expression
return

Awk has several escape sequences like "C". These can be used in strings.

awk - escape sequences
Escape Sequence Meaning
\b backspace
\f form feed
\n new line -- ASCII lf
\r carriage return -- ASCII cr
\t Horizontal tab -- ASCII tab
\nnn ASCII octal value
\\ ASCII backslash
\c For any character c

your name:





  1. What awk command can you use to print out all lines in args.c that have printf?




  2. What awk command will print out all lines in args.c that don't have a plus sign?




  3. What awk command will print out all lines in args.c that are longer than 20 characters?




  4. What awk command will print out lines in args.c that don't have a "(" and are longer than 10 characters?




  5. Using the input file
    http://lt.tucson.az.us/hl2.2008-fallfiles/data
    .
    Arizona Phoenix 6285295 Tucson 32
    California Sacramento 38593635 Eureka 12
    Oregon Salem 12345234 Portland 45
    Washington Olympia 6549872 Seattle 36
    Illinois Springfield 6759346 Chicago 14
    Maine Augusta 456923 Lewiston 23
    Texas Austin 23967433 Houston 26
    
    
    Write an awk program that reverses the order of the fields. Note that this file always has 5 fields. This does not need to be as complicated as the example in the AWK book.




  6. Write an awk program that prints the average of the third field in data file.




  7. Write an awk program that removes the first field and prints only those lines where the third field is greater than 15 million. (Again, use the data).




Turn in this page.
Instructor: Louis Taber, louis.taber.at.pima at gmail dot com (520) 206-6850
My web site in California
The Pima Community College web site

5.15 perl - A programming Language5 Labs5.13 sed - A Stream Editor5.14 awk -- A Programming Language