Home PC Games Linux Windows Database Network Programming Server Mobile  
  Home \ Linux \ Linux awk text analysis tool     - JITwatch installation under Linux (Linux)

- Linux common network tools: batch scanning of hosting services netcat (Linux)

- To install and use the Doxygen under Linux (Linux)

- Using VMware vSphere Client Linux virtual machine installation CentOS6.4 system (Linux)

- Linux, modify / retrieve the root password (Linux)

- To remove those IP is prohibited Fail2ban on CentOS 6/7 (Server)

- IPTABLES configuration steps under Linux (Linux)

- Java Virtual Machine Basics (Programming)

- Figlet use (Linux)

- IBM Data Studio to create objects using ---- double quotes / sensitive issues and the table / column renaming (Database)

- Adjustment expand VMDK format VirtualBox disk space (Linux)

- Improve the efficiency of Linux development tools 5 (Linux)

- Linux Samba server-side structures and the use of the client (Server)

- Android HTTP request with Get Information (Programming)

- Locale files under Ubuntu (Linux)

- ORA-12547: TNS: lost contact error Solution (Database)

- Analysis of Java exception (Programming)

- Wine 1.7 is installed on a system based on RedHat or Debian (Linux)

- On event processing browser compatibility notes (Programming)

- Shell Script: Bulk add users, and set the random character password (Programming)

  Linux awk text analysis tool
  Add Date : 2018-11-21      
  Brief introduction

awk is a powerful text analysis tool, as opposed to grep search, sed editor, awk when its data analysis and report generation, is particularly strong. Awk is to simply read the file line by line, with spaces to the default delimiters per line slices, then cut the part of the various analysis and processing.

There are three different versions of awk: awk, nawk and gawk, not specifically described, generally refers to gawk, gawk is the GNU AWK versions.

awk its name derived from the first letters of its founder Alfred Aho, Peter Weinberger, and Brian Kernighan surnames. AWK actually do have their own language: AWK programming language, the three creators have it officially defined as "style scanning and processing language." It allows you to create short programs that read input files, sort the data, process the data, perform calculations on the input and generate reports, and numerous other functions.


awk '{pattern + action}' {filenames}

Although it may be complex, but the syntax is always the case in which pattern represents AWK to find in the data content, and action is a series of commands when a match is found executed operation. Curly braces ({}) does not need to always appear in the program, but they used a series of instructions are grouped according to a specific pattern. pattern is to be represented by a regular expression, enclosed by a slash.

The most basic function of awk language is based on the specified file or string rules browse and extract information, after awk to extract information, to perform other text manipulation. Complete awk scripts typically used to format text file information.

Typically, awk is a behavior file processing unit. awk each receiving line of the file, and then execute the appropriate commands to manipulate text.

Call awk

There are three ways to call awk

1. command line awk [-F field-separator] 'commands' input-file (s) where, commands are true awk command, [- F field separator] is optional. input-file (s) is the file to be processed. In awk, each line of the file, and the field separator to separate each one is called a domain. Typically, in the case of unnamed -F field delimiter, the default field separator is a blank. 2.shell scripted all the awk command to insert a file, and awk program executable, then awk command interpreter as the first line of the script, again by typing the name of the script to call. Equivalent to shell script the first line:! # / Bin / sh can be replaced with: # / bin / awk3 all awk commands into a single file, then call:!. Awk -f awk-script-file input-file ( s) where, -f option to load awk-script-file in the awk script, input-file (s) with the above is the same.

This chapter focuses on the command line.

Getting examples

Suppose last -n output 5 as follows

[Root @ www ~] # last -n 5 <== remove only the first five lines root pts / 1 Tue Feb 10 11:21 still logged inroot pts / 1 Tue Feb 10 00:46 - 02:28 (01:41) root pts / 1 Mon Feb 9 11:41 - 18:30 (06:48) dmtsai pts / 1 Mon Feb 9 11:41 - 11:41 (00:00) root tty1 Fri Sep 5 14:09 - 14:10 (00:01)

If only display the last login account 5

#last -n 5 | awk '{print $ 1}'

awk workflow is this: read there '\ n' newline split a record, and then record the specified field separator divided domain, fill-in field, $ 0 indicates all domains, $ 1 for the first field, $ n represents the n-th field. The default field separator is "spacebar" or "[tab] key," so $ 1 represents a logged-on user, and $ 3 for the login user ip, and so on.

If only the display / etc / passwd account

#cat / etc / passwd | awk -F ':' '{print $ 1}' rootdaemonbinsys

This is an example of awk + action, each row will perform action {print $ 1}.

-F Specifies the field separator is ':'.

If only the display / etc / passwd accounts and accounts corresponding shell, and between accounts and shell to split the tab

#cat / etc / passwd | awk -F ':' '{print $ 1 "\ t" $ 7}' root / bin / bashdaemon / bin / shbin / bin / shsys / bin / sh

If only the display / etc / passwd accounts and accounts corresponding shell, and between accounts and shell separated by commas, but in all rows add column names name, shell, in the last line add "blue, / bin / nosh".

cat / etc / passwd | awk -F ':' 'BEGIN {print "name, shell"} {print $ 1 "," $ 7} END {print "blue, / bin / nosh"}' name, shellroot, / bin / bashdaemon, / bin / shbin, / bin / shsys, / bin / sh .... blue, / bin / nosh

awk workflow is this: first execution BEGING, then read the file, reads there / n newline split a record, and then record the specified field separator divided domain, fill-in field, $ 0 indicates all domains, $ 1 It represents the first field, $ n represents the n-th field, and then begins execution mode corresponding action action. Then start reading the second record ...... until all records have been read, the last execution of an END action.

Search / etc / passwd root lines have all keywords

#awk -F: '/ root /' / etc / passwdroot: x: 0: 0: root: / root: / bin / bash

This is an example of the pattern, the pattern matching (here root) line will execute action (not specified action, the default output of each line).

Search supports regular, for example, to find the root of the beginning: awk -F: '/ ^ root /' / etc / passwd

Search / etc / passwd root lines have all keywords and displays the corresponding shell

# Awk -F: '/ root / {print $ 7}' / etc / passwd / bin / bash

This specifies the action {print $ 7}

awk built-in variable

awk has many built-in variables used to set the environment information, these variables can be changed, given below some of the most commonly used variables.

ARGC number of command line arguments ARGV arrangement ENVIRON command line arguments to support the number of records using FILENAME awk queue system environment variables browse file name FNR browse files (read record number of the current file) FS Set the input field separator, which is equivalent to -F number of command-line options NF NR domain browsing history records the total number of read OFS output field separator ORS output record separator RS record separator control

In addition, the $ 0 variable refers to the entire record. $ 1 for the first field of the current line, $ 2 for the second field of the current row, and so on .......

Statistics / etc / passwd: file name, each line number, the number of columns per row, the corresponding complete line:

#awk -F ':' '{print "filename:" FILENAME ", linenumber:" NR ", columns:" NF ", linecontent:" $ 0}' / etc / passwdfilename: / etc / passwd, linenumber: 1, columns : 7, linecontent: root: x: 0: 0: root: / root: / bin / bashfilename: / etc / passwd, linenumber: 2, columns: 7, linecontent: daemon: x: 1: 1: daemon: / usr / sbin: / bin / shfilename: / etc / passwd, linenumber: 3, columns: 7, linecontent: bin: x: 2: 2: bin: / bin: / bin / shfilename: / etc / passwd, linenumber: 4, columns: 7, linecontent: sys: x: 3: 3: sys: / dev: / bin / sh

Use printf replacement print, you can make the code more concise, readable

awk -F ':' '{printf ( "filename:% 10s, linenumber:% s, columns:% s, linecontent:% s \ n", FILENAME, NR, NF, $ 0)}' / etc / passwd

print and printf

awk provides both print and printf function two kinds of printouts.

Where the parameter print function can be variable, number or string. String must use double quotes, parameters, separated by commas. If there is no comma, parameters concatenated together and indistinguishable. Here, the role of the role of the comma delimited file and the output is the same, but the latter is a space only.

printf function, its usage and c language printf substantially similar, can be formatted string output complex, easier to use printf, the code more understandable.

awk Programming

Variables and assignments

In addition to built-in variables awk, awk can also customize the variables.

The following statistics / etc / passwd account number

awk '{count ++; print $ 0;} END {print "user count is", count}' /etc/passwdroot:x:0:0:root:/root:/bin/bash......user count is 40

count is custom variables. Before the action {} there is only one print, in fact, just a print statement and action {} may have multiple statements in order; number separated.

There is no initialization count, although the default is 0, but appropriate approach or initialized to 0:

awk 'BEGIN {count = 0; print "[start] user count is", count} {count = count + 1; print $ 0;} END {print "[end] user count is", count}' / etc / passwd [start] user count is 0root: x: 0: 0: root: / root: / bin / bash ... [end] user count is 40

Byte counts a folder under the file uses

ls -l | awk 'BEGIN {size = 0;} {size = size + $ 5;} END {print "[end] size is", size}'
[End] size is 8657198

If M Displays:

ls -l | awk 'BEGIN {size = 0;} {size = size + $ 5;} END {print "[end] size is", size / 1024/1024, "M"}'
[End] size is 8.25889 M

Note that the statistics do not include subdirectories folders.

Conditional statements

awk conditional statement is borrowed from the C language, see the following statement on the way:

if (expression) {
 statement; statement; ... ...} if (expression) {statement;} else {statement2;} if (expression) {statement1;} else if (expression1) {statement2;} else {statement3;}
The number of bytes occupied by the file statistics in a folder under the 4096 filter size of the file (usually a folder):

ls -l | awk 'BEGIN {size = 0; print "[start] size is", size} {if (! $ 5 = 4096) {size = size + $ 5;}} END {print "[end] size is", size / 1024/1024, "M"} '
[End] size is 8.22339 M


awk The loop also borrowed from the C language support while, do / while, for, break, continue, the semantics of these keywords identical semantics and C language.


Because the index awk array can be numbers and letters, the array index is commonly known as keywords (key). Values and keywords are stored in an internal table for key / value hash of the application in. Since the hash is not stored sequentially, so when display contents of the array will find that they are not as you expect out of the order shown. Arrays and variables, are automatically created when using, awk will also automatically determine which stores are digital or string. Generally, awk arrays used to collect information from the record can be used to calculate the sum of the number of statistical word and trace the template is matched and so on.

Display / etc / passwd account

awk -F ':' 'BEGIN {count = 0;} {name [count] = $ 1; count ++;}; END {for (i = 0; i 0 root1 daemon2 bin3 sys4 sync5 games ......

As used herein, for loop through the array
- PHP file upload methods exist in the database (Programming)
- Ubuntu install Avast antivirus software (Programming)
- Linux environment variables inside (Linux)
- For the FreeBSD install Adobe Flash plug-in (Linux)
- Fast Learning Clojure (Programming)
- Good wireless network security information spread in the air (Linux)
- Python exception summary (Programming)
- Elasticsearch Kibana installation notes (Linux)
- Win7 used Eclipse to connect the virtual machine in Ubuntu Hadoop2.4 (Server)
- MySQL InnoDB table --BTree basic data structures (Database)
- Linux / Centos anti CC attack script (Linux)
- How to view the Linux graphics hardware information (Linux)
- Linux System Getting Started Tutorial: How to update outdated version of Ubuntu (Linux)
- Linux character device - a simple character device model (Linux)
- Linux process scheduling opportunity (Programming)
- How to make a U disk to install Ubuntu (Linux)
- Ubuntu 14.04 Configuring cuda-convnet (Linux)
- Oracle physical storage structure outline (Database)
- Linux SSH commands (Linux)
- Linux 64-bit porting (Programming)
  CopyRight 2002-2020 newfreesoft.com, All Rights Reserved.