awk
, (from the initials of Alfred
V. Aho, Peter W. Weinberger, and Brian W. Kernighan)
is an interpretive
programming language for processing text files. Of all 6 possible
combinations of the three initials, awk
seems most appropriate.
At times the language seems awkward (to say the least). It
is also very powerful for text processing. It has many of the
same features as "C".
The AWK language is a good text-processing language. It has more features and capabilities than SED, but less than Perl.
The book: The AWK programming Language is a good awk reference. It was the source for parts of this lab.
Also look at: UNIX System V Release 3 Programmers Guide, Chapter 4 AWK.
The program can be placed on the command line or in a file. The program can read and write files named within the program. You can also pass arguments to the program from the command line.
Many "useful" awk programs are only 1 to 4 lines long! It
may be one of the easiest ways to rearrange the order of fields
in a file or do a summary report. Some examples of short awk
programs:
This could have been done withEND { print NR }
wc
wc -l
NR == 40
NR >= 40 && NR <= 50
NF is the number of fields. $NF is the last field in the record.{ print $NF }
{ field = $NF } END { print $NF }
field
is a variable.
NF > 4
This could have been done with{ words = words + NF } END { print words }
wc
wc -w
/Alaska/
/Alaska/ { count++ } END { print count }
BEGIN { max = -1000000 } $3 > max { max = $3 } END { print max }
BEGIN { max = -1000000 } $3 > max { max = $3; savedline = $0 } END { print savedline }
NF < 4
length < 50
{ for ( j=NF; j>0; j++ ) printf("%s ", $j) printf("\n") }
{ for ( j=1; j<=NF; j++ ) total += $j } END { print( total ) }
The "{ $1 = "" ; print NR, $0 }
$1 = ""
" removes the old line number.
To renumber by tens:
{ $1 = "" ; print 10*NR, $0 }
The format of the awk program is:
This can be repeated many times in anpattern { action } pattern { action } pattern { action }
awk
program.
awk
uses egrep
extended regular expressions.
Two special patterns exist: BEGIN
and END
.
The action
for BEGIN
is processed prior to the first line of the input. This
can be used for printing headers and setting field and record separators.
The action for END
is processed at the end of the input. This can
be used for report summaries.
In a given record, the input fields are referred to as $1, $2, $3 ...
The entire record is referred to as $0.
Built-in variables Variable
Meaning Default ARGC
Number of command line arguments - ARGV
Array of command line arguments - FILENAME
Name of current input file - FNR
Record number of current file - FS
Input field separator " " NF
Number of fields in current record - NR
Number of records read - OFMT
Output format for numbers "%.6g" OFS
Output field separator " " ORS
Output record separator "\n" RLENGTH
Length of string matched with function match
- RS
Input record separator "\n" RSTART
Start of string matched with function match
- SUBSEP
Subscript separator "\034"
Arithmetic Functions
atan2(
y,
x)
arctangent of y/x in the range of -pi to pi cos(
x)
cosine of x (x in radians) exp(
x)
e</sup>x int(
x)
integer part of x log(
x)
logarithm base e of x rand()
random number from 0 to 1 ( 0 <= rand()
< 1 )sin(
x)
sine of x (x in radians) sqrt(
x)
square root of x srand(
x)
x is seed for rand()
String Functions
gsub(
r,
s)
Globally substitute s for r in $0 returns number of substitutions gsub(
r,
s,
t)
Globally substitute s for r in t returns number of substitutions index(
s,
t)
Return first position of t in s returns number of substitutions length(
s)
Returns length of s match(
s,
re)
Test s for regular expression re Returns index or 0 Sets RSTART and RLENGTH split(
s,
a)
spilt s into array a on FS Returns number of fields returned split(
s,
a,
fs)
spilt s into array a on fs Returns number of fields returned sprintf(
format,
list)
return list formated by format sub(
r,
s)
Substitute s for longest left-most substring in $0 sub(
r,
s,
t)
Substitute s for longest left-most substring in t substr(
s,
p)
Return substring of s starting at p to end substr(
s,
p,
n)
Return substring of s starting at p of length n
Operators Arithmetic Operators
+ Addition - Subtraction * Multiplication / Division % Remainder ^ Exponentiation Assignment Operators = Assignment += Addition and assignment -= Subtraction and assignment *= Multiplication and assignment /= Division and assignment %= Remainder and assignment ^= Exponentiation and assignment Increment & Decrement Operators ++ increment (prefix & postfix) - decrement (prefix & postfix)
Operators Relational Operators
< Less than <= Less than or equal == Equal != Not equal >= Greater than or equal > Greater than ~ Does the string contain the re !~ Does the string not contain the re Logical Operators || Logical OR && Logical AND ! Logical NOT
Flow of control
if(
expression)
statementif(
expression)
statement1else
statement2while(
expression)
statementfor(
expression1;
expression2;
expression3)
statementdo
statementwhile(
expression)
break
continue
next
exit
exit
expressionreturn
Awk has several escape sequences like "C". These can be used in strings.
AWK escape sequences Escape Sequence
Meaning \b backspace \f form feed \n new line -- ASCII lf \r carriage return -- ASCII cr \t Horizontal tab -- ASCII tab \nnn ASCII octal value \\ ASCII backslash \c For any character c
your name:
awk
command can you use to print out all lines in
args.c
that have printf
?awk
command will print out all lines in
args.c
that don't have a plus sign?awk
command will print out all lines in args.c
that are
longer than 20 characters?awk
command will
print out lines in args.c
that don't have a "(" and are longer
than 10 characters?Write anArizona Phoenix 6285295 Tucson 32 California Sacramento 38593635 Eureka 12 Oregon Salem 12345234 Portland 45 Washington Olympia 6549872 Seattle 36 Illinois Springfield 6759346 Chicago 14 Maine Augusta 456923 Lewiston 23 Texas Austin 23967433 Houston 26
awk
program that reverses
the order of the fields. Note that this file always has 5 fields. This
does not need to be as complicated as the example in the AWK book.awk
program that prints the average of
the third field in data file.awk
program that removes the first field and prints only those
lines where the third field is greater than 15 million. (Again, use the
data).