Linux awk command is easy to understand and minute minute to learn

2025-10-09 07:22:00

Introduction

Awk is a powerful text processing tool widely used in Unix and Linux environments. While grep is for searching and sed for editing, awk shines when it comes to analyzing structured data and generating reports. It processes files line by line, splitting each line into fields using space as the default delimiter. These fields are then analyzed or manipulated according to user-defined rules.

There are three main versions of awk: awk, nawk, and gawk. By default, unless specified otherwise, gawk is the most commonly used version, as it is the GNU implementation of the AWK programming language.

The name "Awk" comes from the initials of its creators: Alfred Aho, Peter Weinberger, and Brian Kernighan. AWK is not just a utility; it's a full-fledged programming language designed for pattern scanning and processing. It allows users to create scripts that read input files, sort data, perform calculations, and generate reports. Its flexibility makes it an essential tool for system administrators and developers alike.

Instructions

The basic syntax of awk is: awk '{pattern + action}' {filenames}. Although it may seem complex at first, the structure remains consistent. The pattern defines what awk looks for in the data, while the action specifies what to do when a match is found. Braces ({}) are used to group multiple actions together, but they are not always required. Patterns can be regular expressions enclosed in slashes.

At its core, awk is designed to extract and process information from text files based on specific rules. Once the data is extracted, additional text manipulation can be performed. A complete awk script is often used to reformat or analyze the content of a file.

Each time awk processes a file, it reads one line at a time and applies the corresponding commands. This makes it ideal for handling large datasets efficiently.

Calling Awk

There are three primary ways to invoke awk:

1. Command Line Mode

awk [-F field-separator] 'commands' input-file(s). Here, 'commands' refers to the actual awk instructions, and [-F] is optional for specifying the field separator. Input files are the ones being processed.

In awk, each line is split into fields based on the specified delimiter. By default, the delimiter is a space.

2. Shell Script Mode

You can embed all your awk commands in a file and make it executable. Then, use the awk interpreter by adding a shebang line like #!/bin/awk at the top of the script.

3. External Script File

You can also run an external awk script using the -f option: awk -f awk-script-file input-file(s). This loads the script from the specified file.

This chapter will focus on the command-line approach.

Example Usage

Suppose we run last -n 5, which shows the last five login records. The output might look like this:

# last -n 5

root pts/1 192.168.1.100 Tue Feb 10 11:21 still logged in

root pts/1 192.168.1.100 Tue Feb 10 00:46 - 02:28 (01:41)

root pts/1 192.168.1.100 Mon Feb 9 11:41 - 18:30 (06:48)

dmtsai pts/1 192.168.1.100 Mon Feb 9 11:41 - 11:41 (00:00)

root tty1 Fri Sep 5 14:09 - 14:10 (00:01)

To display only the usernames who logged in recently, you can run:

# last -n 5 | awk '{print $1}'

root

dmtsai

root

In this case, awk reads each line, splits it into fields, and prints the first field ($1), which represents the username.

Similarly, if you want to display the usernames and their shells from /etc/passwd, you can use:

# cat /etc/passwd | awk -F ':' '{print $1}'

root

daemon

bin

sys

If you want to display both the username and shell separated by a tab, you can write:

# cat /etc/passwd | awk -F ':' '{print $1"\t"$7}'

root /bin/bash

daemon /bin/sh

bin /bin/sh

sys /bin/sh

For more advanced formatting, you can include headers and a footer. For example, to add a header and a final line, you can use:

# cat /etc/passwd | awk -F ':' 'BEGIN {print "name,shell"} {print $1","$7} END {print "blue,/bin/nosh"}'

name,shell

root,/bin/bash

daemon,/bin/sh

bin,/bin/sh

sys,/bin/sh

blue,/bin/nosh

The workflow of awk follows a specific order: it first runs the BEGIN block, then processes each line of the input file, and finally executes the END block after all lines have been read.

Searching for a keyword like "root" in /etc/passwd can be done with:

# awk -F: '/root/' /etc/passwd

root:x:0:0:root:/root:/bin/bash

This displays all lines containing the word "root". If you want to print the shell associated with the root user, you can write:

# awk -F: '/root/{print $7}' /etc/passwd

/bin/bash

Built-in Variables

Awk provides several built-in variables that help manage the environment. Some of the most commonly used ones include:

- ARGC: Number of command-line arguments

- ARGV: Array of command-line arguments

- FILENAME: Name of the current file being processed

- FNR: Number of records read from the current file

- NF: Number of fields in the current record

- NR: Total number of records read

- FS: Field separator (default is space)

- OFS: Output field separator

- ORS: Output record separator

- RS: Record separator (default is newline)

For example, to display the filename, line number, number of fields, and the entire line from /etc/passwd, you can use:

# awk -F ':' '{print "filename:" FILENAME ",linenumber:" NR ",columns:" NF ",linecontent:" $0}' /etc/passwd

filename:/etc/passwd,linenumber:1,columns:7,linecontent:root:x:0:0:root:/root:/bin/bash

filename:/etc/passwd,linenumber:2,columns:7,linecontent:daemon:x:1:1:daemon:/usr/sbin:/bin/sh

filename:/etc/passwd,linenumber:3,columns:7,linecontent:bin:x:2:2:bin:/bin:/bin/sh

filename:/etc/passwd,linenumber:4,columns:7,linecontent:sys:x:3:3:sys:/dev:/bin/sh

Using printf instead of print can make the output more readable and concise.

# awk -F ':' '{printf("filename:%10s,linenumber:%s,columns:%s,linecontent:%s\n", FILENAME, NR, NF, $0)}' /etc/passwd

Print and Printf

Awk offers two functions for output: print and printf.

The print function can take variables, values, or strings as arguments. Strings must be enclosed in quotes, and commas separate different parameters. Without a comma, the arguments are concatenated. The comma acts as a separator, similar to the default space in output.

The printf function works similarly to the C languageâ€™s printf, allowing formatted output. It is especially useful when dealing with complex outputs, making the code easier to understand.

Programming with Awk

Variables and Assignments

In addition to built-in variables, awk allows users to define custom variables. For example, to count the number of accounts in /etc/passwd, you can write:

# awk '{count++; print $0;} END {print "user count is ", count}' /etc/passwd

root:x:0:0:root:/root:/bin/bash

...

user count is 40

Itâ€™s good practice to initialize variables before using them. For instance:

# awk 'BEGIN {count=0; print "[start]user count is ", count} {count=count+1; print $0;} END {print "[end]user count is ", count}' /etc/passwd

[start]user count is 0

root:x:0:0:root:/root:/bin/bash

...

[end]user count is 40

Counting the total size of files in a directory:

# ls -l | awk 'BEGIN {size=0;} {size=size+$5;} END {print "[end]size is ", size}'

[end]size is 8657198

To display the result in megabytes:

# ls -l | awk 'BEGIN {size=0;} {size=size+$5;} END {print "[end]size is ", size/1024/1024, "M"}'

[end]size is 8.25889 M

Note: This does not include subdirectories.

Conditional Statements

Awk supports conditional statements similar to those in C. For example:

if (expression) {

statement

}

if (expression) {

statement

} else {

statement2

}

if (expression) {

statement1

} else if (expression1) {

statement2

} else {

statement3

}

To count the size of files excluding directories (which usually have a size of 4096):

# ls -l | awk 'BEGIN {size=0; print "[start]size is ", size} {if ($5 != 4096) {size = size + $5;}} END {print "[end]size is ", size/1024/1024, "M"}'

[end]size is 8.22339 M

Loop Statements

Awk supports loops such as while, do/while, and for, similar to C. These allow for repeated execution of blocks of code.

Arrays

In awk, arrays can use both numeric and string keys. This makes them very flexible for storing and retrieving data. For example, to store usernames from /etc/passwd:

# awk -F ':' 'BEGIN {count=0;} {name[count] = $1; count++;}; END {for (i = 0; i < count; i++) print i, name[i]}' /etc/passwd

0 root

1 daemon

2 bin

3 sys

4 sync

5 games

...

Here, a for loop is used to iterate through the array and display its contents.

Weigh Modules

Load Cell Module,Weighing Module,Weigh Modules Load Cells,Module Weighing

Xiaogan Yueneng Electronic Technology Co., Ltd. , https://www.xgsensor.com