awk scripts consist of patterns and procedures:
pattern { procedure }
Both are optional. If pattern is missing, {
procedure }
is applied to all lines; if {
procedure }
is missing, the matched line is printed.
A pattern can be any of the following:
/regular expression/ relational expression pattern-matching expression BEGIN END
Expressions can be composed of quoted strings, numbers, operators, functions, defined variables, or any of the predefined variables described later in the section "Built-in Variables."
Regular expressions use the extended set of metacharacters and are described in Chapter 6, Pattern Matching.
^
and $
refer to the beginning and end of a string (such as the fields), respectively, rather than the beginning and end of a line. In particular, these metacharacters will not match at a newline embedded in the middle of a string.
Relational expressions use the relational operators listed in the section "Operators" later in this chapter. For example, $2 > $1
selects lines for which the second field is greater than the first. Comparisons can be either string or numeric. Thus, depending on the types of data in $1
and $2
, awk does either a numeric or a string comparison. This can change from one record to the next.
Pattern-matching expressions use the operators ~
(match) and !~
(don't match). See the section "Operators" later in this chapter.
The BEGIN
pattern lets you specify procedures that take place before the first input line is processed. (Generally, you set global variables here.)
The END
pattern lets you specify procedures that take place after the last input record is read.
In nawk, BEGIN
and END
patterns may appear multiple times. The procedures are merged as if there had been one large procedure.
Except for BEGIN
and END
, patterns can be combined with the Boolean operators ||
(or), &&
(and), and !
(not). A range of lines can also be specified using comma-separated patterns:
pattern,pattern
Procedures consist of one or more commands, functions, or variable assignments, separated by newlines or semicolons, and contained within curly braces. Commands fall into five groups:
Variable or array assignments
Printing commands
Built-in functions
Control-flow commands
User-defined functions (nawk only)
Print first field of each line:
{ print $1 }
Print all lines that contain pattern:
/pattern/
Print first field of lines that contain pattern:
/pattern/ { print $1 }
Select records containing more than two fields:
NF > 2
Interpret input records as a group of lines up to a blank line. Each line is a single field:
BEGIN { FS = "\n"; RS = "" }
Print fields 2 and 3 in switched order, but only on lines whose first field matches the string "URGENT":
$1 ~ /URGENT/ { print $3, $2 }
Count and print the number of pattern found:
/pattern/ { ++x } END { print x }
Add numbers in second column and print total:
{ total += $2 } END { print "column total is", total}
Print lines that contain less than 20 characters:
length($0) < 20
Print each line that begins with Name: and that contains exactly seven fields:
NF == 7 && /^Name:/
Print the fields of each input record in reverse order, one per line:
{ for (i = NF; i >= 1; i--) print $i }