Introduction
Awk is a powerful text processing tool used for analyzing and generating reports from data. Unlike grep or sed, which are primarily for searching and editing, awk excels in handling structured data. It processes files line by line, splitting each line into fields using spaces as the default delimiter. These fields can then be analyzed or manipulated according to specific rules.
There are three main versions of awk: awk, nawk, and gawk. In most cases, when people refer to awk, they mean gawk, the GNU version of the AWK language. The name "awk" comes from the initials of its creators: Alfred Aho, Peter Weinberger, and Brian Kernighan. AWK is not just a command but a full-fledged programming language designed for pattern scanning and processing. It allows users to write programs that process input files, sort data, perform calculations, and generate reports efficiently.
Instructions
The basic syntax of awk is: awk '{pattern + action}' {filenames}
. Although the operation might seem complex, the structure remains consistent. The pattern defines what awk should look for in the data, while the action specifies what to do when a match is found. Braces ({}) are used to group multiple instructions under a specific pattern.
One of the primary functions of awk is to extract information based on specified rules from files or strings. Once extracted, this data can be further processed. An awk script typically formats and manipulates text data effectively.
When processing a file, awk reads it line by line, executing commands on each line. This makes it ideal for tasks such as filtering, transforming, and summarizing data.
Calling awk
There are three common ways to invoke awk:
1. Command Line Mode: awk [-F field-separator] '{commands}' input-file(s)
Here, 'commands' are the actual awk instructions, and [-F] is an optional parameter for defining the field separator. The default separator is a space, but you can change it using -F.
2. Shell Script Mode: Insert awk commands into a script file and make it executable. Then, use the interpreter #!/bin/awk
at the top of the script.
3. Using an External Script: Store the awk code in a separate file and call it with awk -f awk-script-file input-file(s)
.
This chapter focuses on the command line approach.
Example
Suppose the output of last -n 5
is as follows:
Root pts/1 192.168.1.100 Tue Feb 10 11:21 still logged in
Root pts/1 192.168.1.100 Tue Feb 10 00:46 - 02:28 (01:41)
Root pts/1 192.168.1.100 Mon Feb 9 11:41 - 18:30 (06:48)
Dmtsai pts/1 192.168.1.100 Mon Feb 9 11:41 - 11:41 (00:00)
Root tty1 Fri Sep 5 14:09 - 14:10 (00:01)
To display only the five users who logged in recently:
last -n 5 | awk '{print $1}'
Root
Root
Root
Dmtsai
Root
The workflow of awk is straightforward: it reads a record split by newlines, splits the record using the field separator, assigns values to $0 (the whole line), $1 (first field), and so on. By default, the field separator is a space, making $1 represent the username, $3 the IP address, etc.
To display the user accounts from /etc/passwd
:
cat /etc/passwd | awk -F ':' '{print $1}'
Root
Daemon
Bin
Sys
This is an example of an awk action, where {print $1}
is executed on every line.
To display the user account and shell separated by a tab:
cat /etc/passwd | awk -F ':' '{print $1"\t"$7}'
Root /bin/bash
Daemon /bin/sh
Bin /bin/sh
Sys /bin/sh
To add column headers and a final line:
cat /etc/passwd | awk -F':' 'BEGIN {print "name,shell"} {print $1","$7} END {print "blue,/bin/nosh"}'
Name,shell
Root,/bin/bash
Daemon,/bin/sh
Bin,/bin/sh
Sys,/bin/sh
Blue,/bin/nosh
The workflow includes executing the BEGIN block first, then reading the file line by line, processing each line, and finally running the END block after all records are processed.
Search for lines containing the keyword 'root' in /etc/passwd
:
awk -F: '/root/' /etc/passwd
Root:x:0:0:root:/root:/bin/bash
This is an example of using a pattern. If a line matches the pattern (here, 'root'), the default action is to print the entire line.
Use regular expressions to search for lines starting with 'root':
awk -F: '/^root/' /etc/passwd
To display the shell for each line containing 'root' in /etc/passwd
:
awk -F: '/root/{print $7}' /etc/passwd
/bin/bash
This specifies the action {print $7}
.
Built-in Variables in awk
Awk has several built-in variables that store environment-related information. Some of the most commonly used ones include:
- ARGC: Number of command-line arguments.
- ARGV: Array of command-line arguments.
- ENVIRON: System environment variables.
- FILENAME: Current file being processed.
- FNR: Number of records read from the current file.
- FS: Input field separator (equivalent to -F).
- NF: Number of fields in the current record.
- NR: Total number of records read.
- OFS: Output field separator.
- ORS: Output record separator.
- RS: Input record separator.
The variable $0 represents the entire line, while $1, $2, etc., represent individual fields.
To display the filename, line number, number of columns, and the entire line content for /etc/passwd
:
awk -F ':' '{print "filename:" FILENAME ",linenumber:" NR ",columns:" NF ",linecontent:" $0}' /etc/passwd
Filename: /etc/passwd, linenumber:1, columns:7, linecontent:root:x:0:0:root:/root:/bin/bash
Filename: /etc/passwd, linenumber:2, columns:7, linecontent:daemon:x:1:1:daemon:/usr/sbin:/bin/sh
Filename: /etc/passwd, linenumber:3, columns:7, linecontent:bin:x:2:2:bin:/bin:/bin/sh
Filename: /etc/passwd, linenumber:4, columns:7, linecontent:sys:x:3:3:sys:/dev:/bin/sh
For more readable output, use printf
instead of print
:
awk -F':' '{printf("filename:%10s,linenumber:%s,columns:%s,linecontent:%s\n", FILENAME, NR, NF, $0)}' /etc/passwd
Print vs. Printf
Awk provides both print
and printf
functions for output. The parameters of print
can be variables, values, or strings. Strings must be enclosed in quotes, and commas separate different parameters. Without commas, the values are concatenated. The comma acts like a separator, similar to spaces in output files.
The printf
function works similarly to C's printf
, allowing formatted output. It’s particularly useful for complex outputs, making the code clearer and easier to understand.
Awk Programming
Variables and Assignments
In addition to built-in variables, awk allows custom variables. For example, to count the number of accounts in /etc/passwd
:
awk '{count++; print $0;} END {print "user count is ", count}' /etc/passwd
Root:x:0:0:root:/root:/bin/bash
...
User count is 40
The variable count is a custom variable. It starts at 0 and increments for each line. To initialize it properly:
awk 'BEGIN {count=0; print "[start]user count is ", count} {count=count+1; print $0;} END {print "[end]user count is ", count}' /etc/passwd
[start]user count is 0
Root:x:0:0:root:/root:/bin/bash
...
[end]user count is 40
To count the total size of files in a directory:
ls -l | awk 'BEGIN {size=0;} {size=size+$5;} END {print "[end]size is ", size}'
[end]size is 8657198
To display the size in megabytes:
ls -l | awk 'BEGIN {size=0;} {size=size+$5;} END {print "[end]size is ", size/1024/1024, "M"}'
[end]size is 8.25889 M
Note: This does not include subdirectories.
Conditional Statements
Awk supports conditional statements similar to C. For example:
if (expression) { statement }
if (expression) { statement } else { statement2 }
if (expression) { statement1 } else if (expression1) { statement2 } else { statement3 }
To count file sizes excluding directories (size 4096):
ls -l | awk 'BEGIN {size=0; print "[start]size is ", size} {if ($5 != 4096) {size = size + $5;}} END {print "[end]size is ", size/1024/1024, "M"}'
[end]size is 8.22339 M
Loop Statements
Awk also supports loop structures like while
, do/while
, for
, break
, and continue
, similar to C.
Arrays
In awk, arrays can have string or numeric keys. They are often used to store and process data collected from records. Since arrays are hash-based, the order of elements may not be preserved. For example, to list all user accounts from /etc/passwd
:
awk -F ':' 'BEGIN {count=0;} {name[count] = $1; count++;}; END {for (i = 0; i < count; i++) print i, name[i]}' /etc/passwd
0 root
1 daemon
2 bin
3 sys
4 sync
5 games
...
A for
loop is used to iterate through the array.
Load Pins,Load Pin Load Cell,Oem Load Cells,Strainsert Load Pin
Xiaogan Yueneng Electronic Technology Co., Ltd. , https://www.xgsensor.com