Linux awk command is easy to understand and minute minute to learn

Introduction

Awk is a powerful text processing tool used for analyzing and generating reports from data. Unlike grep or sed, which are primarily for searching and editing, awk excels in handling structured data. It processes files line by line, splitting each line into fields using spaces as the default delimiter. These fields can then be analyzed or manipulated according to specific rules.

There are three main versions of awk: awk, nawk, and gawk. In most cases, when people refer to awk, they mean gawk, the GNU version of the AWK language. The name "awk" comes from the initials of its creators: Alfred Aho, Peter Weinberger, and Brian Kernighan. AWK is not just a command but a full-fledged programming language designed for pattern scanning and processing. It allows users to write programs that process input files, sort data, perform calculations, and generate reports efficiently.

Instructions

The basic syntax of awk is: awk '{pattern + action}' {filenames}. Although the operation might seem complex, the structure remains consistent. The pattern defines what awk should look for in the data, while the action specifies what to do when a match is found. Braces ({}) are used to group multiple instructions under a specific pattern.

One of the primary functions of awk is to extract information based on specified rules from files or strings. Once extracted, this data can be further processed. An awk script typically formats and manipulates text data effectively.

When processing a file, awk reads it line by line, executing commands on each line. This makes it ideal for tasks such as filtering, transforming, and summarizing data.

Calling awk

There are three common ways to invoke awk:

1. Command Line Mode: awk [-F field-separator] '{commands}' input-file(s)

Here, 'commands' are the actual awk instructions, and [-F] is an optional parameter for defining the field separator. The default separator is a space, but you can change it using -F.

2. Shell Script Mode: Insert awk commands into a script file and make it executable. Then, use the interpreter #!/bin/awk at the top of the script.

3. Using an External Script: Store the awk code in a separate file and call it with awk -f awk-script-file input-file(s).

This chapter focuses on the command line approach.

Example

Suppose the output of last -n 5 is as follows:

Root pts/1 192.168.1.100 Tue Feb 10 11:21 still logged in

Root pts/1 192.168.1.100 Tue Feb 10 00:46 - 02:28 (01:41)

Root pts/1 192.168.1.100 Mon Feb 9 11:41 - 18:30 (06:48)

Dmtsai pts/1 192.168.1.100 Mon Feb 9 11:41 - 11:41 (00:00)

Root tty1 Fri Sep 5 14:09 - 14:10 (00:01)

To display only the five users who logged in recently:

last -n 5 | awk '{print $1}'

Root

Root

Root

Dmtsai

Root

The workflow of awk is straightforward: it reads a record split by newlines, splits the record using the field separator, assigns values to $0 (the whole line), $1 (first field), and so on. By default, the field separator is a space, making $1 represent the username, $3 the IP address, etc.

To display the user accounts from /etc/passwd:

cat /etc/passwd | awk -F ':' '{print $1}'

Root

Daemon

Bin

Sys

This is an example of an awk action, where {print $1} is executed on every line.

To display the user account and shell separated by a tab:

cat /etc/passwd | awk -F ':' '{print $1"\t"$7}'

Root /bin/bash

Daemon /bin/sh

Bin /bin/sh

Sys /bin/sh

To add column headers and a final line:

cat /etc/passwd | awk -F':' 'BEGIN {print "name,shell"} {print $1","$7} END {print "blue,/bin/nosh"}'

Name,shell

Root,/bin/bash

Daemon,/bin/sh

Bin,/bin/sh

Sys,/bin/sh

Blue,/bin/nosh

The workflow includes executing the BEGIN block first, then reading the file line by line, processing each line, and finally running the END block after all records are processed.

Search for lines containing the keyword 'root' in /etc/passwd:

awk -F: '/root/' /etc/passwd

Root:x:0:0:root:/root:/bin/bash

This is an example of using a pattern. If a line matches the pattern (here, 'root'), the default action is to print the entire line.

Use regular expressions to search for lines starting with 'root':

awk -F: '/^root/' /etc/passwd

To display the shell for each line containing 'root' in /etc/passwd:

awk -F: '/root/{print $7}' /etc/passwd

/bin/bash

This specifies the action {print $7}.

Built-in Variables in awk

Awk has several built-in variables that store environment-related information. Some of the most commonly used ones include:

- ARGC: Number of command-line arguments.

- ARGV: Array of command-line arguments.

- ENVIRON: System environment variables.

- FILENAME: Current file being processed.

- FNR: Number of records read from the current file.

- FS: Input field separator (equivalent to -F).

- NF: Number of fields in the current record.

- NR: Total number of records read.

- OFS: Output field separator.

- ORS: Output record separator.

- RS: Input record separator.

The variable $0 represents the entire line, while $1, $2, etc., represent individual fields.

To display the filename, line number, number of columns, and the entire line content for /etc/passwd:

awk -F ':' '{print "filename:" FILENAME ",linenumber:" NR ",columns:" NF ",linecontent:" $0}' /etc/passwd

Filename: /etc/passwd, linenumber:1, columns:7, linecontent:root:x:0:0:root:/root:/bin/bash

Filename: /etc/passwd, linenumber:2, columns:7, linecontent:daemon:x:1:1:daemon:/usr/sbin:/bin/sh

Filename: /etc/passwd, linenumber:3, columns:7, linecontent:bin:x:2:2:bin:/bin:/bin/sh

Filename: /etc/passwd, linenumber:4, columns:7, linecontent:sys:x:3:3:sys:/dev:/bin/sh

For more readable output, use printf instead of print:

awk -F':' '{printf("filename:%10s,linenumber:%s,columns:%s,linecontent:%s\n", FILENAME, NR, NF, $0)}' /etc/passwd

Print vs. Printf

Awk provides both print and printf functions for output. The parameters of print can be variables, values, or strings. Strings must be enclosed in quotes, and commas separate different parameters. Without commas, the values are concatenated. The comma acts like a separator, similar to spaces in output files.

The printf function works similarly to C's printf, allowing formatted output. It’s particularly useful for complex outputs, making the code clearer and easier to understand.

Awk Programming

Variables and Assignments

In addition to built-in variables, awk allows custom variables. For example, to count the number of accounts in /etc/passwd:

awk '{count++; print $0;} END {print "user count is ", count}' /etc/passwd

Root:x:0:0:root:/root:/bin/bash

...

User count is 40

The variable count is a custom variable. It starts at 0 and increments for each line. To initialize it properly:

awk 'BEGIN {count=0; print "[start]user count is ", count} {count=count+1; print $0;} END {print "[end]user count is ", count}' /etc/passwd

[start]user count is 0

Root:x:0:0:root:/root:/bin/bash

...

[end]user count is 40

To count the total size of files in a directory:

ls -l | awk 'BEGIN {size=0;} {size=size+$5;} END {print "[end]size is ", size}'

[end]size is 8657198

To display the size in megabytes:

ls -l | awk 'BEGIN {size=0;} {size=size+$5;} END {print "[end]size is ", size/1024/1024, "M"}'

[end]size is 8.25889 M

Note: This does not include subdirectories.

Conditional Statements

Awk supports conditional statements similar to C. For example:

if (expression) { statement }

if (expression) { statement } else { statement2 }

if (expression) { statement1 } else if (expression1) { statement2 } else { statement3 }

To count file sizes excluding directories (size 4096):

ls -l | awk 'BEGIN {size=0; print "[start]size is ", size} {if ($5 != 4096) {size = size + $5;}} END {print "[end]size is ", size/1024/1024, "M"}'

[end]size is 8.22339 M

Loop Statements

Awk also supports loop structures like while, do/while, for, break, and continue, similar to C.

Arrays

In awk, arrays can have string or numeric keys. They are often used to store and process data collected from records. Since arrays are hash-based, the order of elements may not be preserved. For example, to list all user accounts from /etc/passwd:

awk -F ':' 'BEGIN {count=0;} {name[count] = $1; count++;}; END {for (i = 0; i < count; i++) print i, name[i]}' /etc/passwd

0 root

1 daemon

2 bin

3 sys

4 sync

5 games

...

A for loop is used to iterate through the array.

Load Pins

Load Pins,Load Pin Load Cell,Oem Load Cells,Strainsert Load Pin

Xiaogan Yueneng Electronic Technology Co., Ltd. , https://www.xgsensor.com