4.14 awk -- A Programming Language |
awk
-- A Programming Languageawk
, (from the initials of Alfred
V. Aho, Peter W. Weinberger, and Brian W. Kernighan)
is an interpretive
programming language for processing text files. Of all 6 possible
combinations of the three initials, awk
seems most appropriate.
At times the language seems awkward (to say the least). It
is also very powerful for text processing. It has many of the
same features as "C".
The AWK language is a good text-processing language.
It has more features and capabilities than sed
, but less than Perl.
The book:
The AWK programming Language
is a good awk
reference. It was the source for parts of this lab.
Also look at: UNIX System V Release 3 Programmers Guide, Chapter 4 AWK.
The program can be placed on the command line or in a file. The program can read and write files named within the program. You can also pass arguments to the program from the command line.
Invoking awk one of two ways, the first places the program on the command line, the second places the awk program in a file.
awk 'program text' [file...]
awk -f program-file [file...]
An example of a two statement awk
program on the command line follows. The
program
prints all lines with either "Arizona" or "72".
username@gort ~ $
awk '/Arizona/;/72/' /home/cis137/data
Many "useful" awk programs are only 1 to 4 lines long! It
may be one of the easiest ways to rearrange the order of fields
in a file or do a summary report. Some examples of short awk
programs:
END { print NR }This could have been done with
wc
wc -l
NR == 40
NR >= 40 && NR <= 50
{ print $NF }NF is the number of fields. $NF is the last field in the record.
{ field = $NF } END { print field }
field
is a variable.
NF > 4
{ words = words + NF } END { print words }This could have been done with
wc
wc -w
/Alaska/
/Alaska/ { count++ } END { print count }
BEGIN { max = -1000000 } $3 > max { max = $3 } END { print max }
BEGIN { max = -1000000 } $3 > max { max = $3; savedline = $0 } END { print savedline }
NF < 4
length < 50
{ for ( j=NF; j>0; j-- ) printf("%s ", $j) printf("\n") }
{ for ( j=1; j<=NF; j++ ) total += $j } END { print( total ) }
{ $1 = "" ; print NR, $0 }The "
$1 = ""
" removes the old line number.
To renumber by tens:
{ $1 = "" ; print 10*NR, $0 }
The format of the awk program is:
pattern { action } pattern { action } pattern { action }
This can be repeated many times in an awk
program.
awk
uses egrep
extended regular expressions.
Two special patterns exist: BEGIN
and END
.
The action
for BEGIN
is processed prior to the first line of the input. This
can be used for printing headers and setting field and record separators.
The action for END
is processed at the end of the input. This can
be used for report summaries.
In a given record, the input fields are referred to as $1, $2, $3 ...
The entire record is referred to as $0.
Variable | Meaning | Default |
ARGC | Number of command line arguments | - |
ARGV | Array of command line arguments | - |
FILENAME | Name of current input file | - |
FNR | Record number of current file | - |
FS | Input field separator | " " |
NF | Number of fields in current record | - |
NR | Number of records read | - |
OFMT | Output format for numbers | "%.6g" |
OFS | Output field separator | " " |
ORS | Output record separator | "\n" |
RLENGTH | Length of string matched with function match | - |
RS | Input record separator | "\n" |
RSTART | Start of string matched with function match | - |
SUBSEP | Subscript separator | "\034" |
atan2( y, x) | arctangent of y/x in the range of -pi to pi |
cos( x) | cosine of x (x in radians) |
exp( x) | e</sup>x |
int( x) | integer part of x |
log( x) | logarithm base e of x |
rand() | random number from 0 to 1 ( 0 <= rand() < 1 ) |
sin( x) | sine of x (x in radians) |
sqrt( x) | square root of x |
srand( x) | x is seed for rand() |
gsub( r, s) | Globally substitute s for r in $0 |
returns number of substitutions | |
gsub( r, s, t) | Globally substitute s for r in t |
returns number of substitutions | |
index( s, t) | Return first position of t in s |
returns number of substitutions | |
length( s) | Returns length of s |
match( s, re) | Test s for regular expression re |
Returns index or 0 | |
Sets RSTART and RLENGTH | |
split( s, a) | spilt s into array a on FS |
Returns number of fields returned | |
split( s, a, fs) | spilt s into array a on fs |
Returns number of fields returned | |
sprintf( format, list) | return list formated by format |
sub( r, s) | Substitute s for longest |
left-most substring in $0 | |
sub( r, s, t) | Substitute s for longest |
left-most substring in t | |
substr( s, p) | Return substring of s starting at p to end |
substr( s, p, n) | Return substring of s starting at p of length n |
Arithmetic Operators | |
+ | Addition |
- | Subtraction |
* | Multiplication |
/ | Division |
% | Remainder |
^ | Exponentiation |
Assignment Operators | |
= | Assignment |
+= | Addition and assignment |
-= | Subtraction and assignment |
*= | Multiplication and assignment |
/= | Division and assignment |
%= | Remainder and assignment |
^= | Exponentiation and assignment |
Increment & Decrement Operators | |
++ | increment (prefix & postfix) |
- | decrement (prefix & postfix) |
Relational Operators | |
< | Less than |
<= | Less than or equal |
== | Equal |
!= | Not equal |
>= | Greater than or equal |
> | Greater than |
~ | Does the string contain the re |
!~ | Does the string not contain the re |
Logical Operators | |
|| | Logical OR |
&& | Logical AND |
! | Logical NOT |
if( expression) statement |
if( expression) statement1 else statement2 |
while( expression) statement |
for( expression1; expression2; expression3) statement |
do statement while( expression ) |
break |
continue |
next |
exit |
exit expression |
return |
Escape Sequence | Meaning |
\b | backspace |
\f | form feed |
\n | new line -- ASCII lf |
\r | carriage return -- ASCII cr |
\t | Horizontal tab -- ASCII tab |
\nnn | ASCII octal value |
\\ | ASCII backslash |
\c | For any character c |
awk
command can you use to print out all lines in
args.c
that have printf
?awk
command will print out all lines in
args.c
that don't have a plus sign?awk
command will print out all lines in args.c
that are
longer than 20 characters?awk
command will
print out lines in args.c
that don't have a "(" and are longer
than 10 characters?Arizona Phoenix 6285295 Tucson 32 California Sacramento 38593635 Eureka 12 Oregon Salem 12345234 Portland 45 Washington Olympia 6549872 Seattle 36 Illinois Springfield 6759346 Chicago 14 Maine Augusta 456923 Lewiston 23 Texas Austin 23967433 Houston 26Write an
awk
program that reverses
the order of the fields. Note that this file always has 5 fields. This
does not need to be as complicated as the example in the AWK book.awk
program that prints the average of
the third field in data file.awk
program that removes the first field and prints only those
lines where the third field is greater than 15 million. (Again, use the
data).4.14 awk -- A Programming Language |