Home IT Linux Windows Database Network Programming Server Mobile  
  Home \ Linux \ Linux Powerful command Awk Introduction     - Java Adapter Mode (Programming)

- C ++ thread creates transmission parameters are changed (Programming)

- VMware virtual machine can not start VMnet0 no Internet access and other issues (Linux)

- Stucts2 values on the page and how the attribute values corresponding to the Action (Programming)

- Installation on Ubuntu class Winamp audio player Qmmp 0.9.0 (Linux)

- SELinux security permissions HTTP + PHP service changes (Server)

- Django 1.8 TEMPLATE_DIRS configuration and STATICFILES_DIRS (Server)

- Ubuntu install Vendetta Online 14.04 (Linux)

- Chkconfig command Detailed service is added and shut down the system in two ways to start service under Linux (Linux)

- How to use the DM-Crypt encryption Linux File System (Linux)

- SYN attack hacker attack and defense of the basic principles and prevention technology (Linux)

- Linux maximum number of threads and limit the number of queries the current thread (Linux)

- Cache implementation APP interacts with the server-side interface control Session (Server)

- Docker Basic and Advanced (Linux)

- 20 Advanced Java interview questions summary (Programming)

- Mac OS X Server installation and application (Linux)

- Debian 7.7 Installation and Configuration (Linux)

- Cool Android realization SVG animation (Programming)

- Oracle can not change the tablespace to backup mode in non-archive mode (Database)

- Postfix mail service system principle and configuration (Linux)

  Linux Powerful command Awk Introduction
  Add Date : 2017-08-31      
  What is Awk

Awk is a small programming language and command-line tools. (Whose name is derived from its founder Alfred Aho, Peter Weinberger, and the first letter of the last name of Brian Kernighan). It is ideal for log processing on the server, mainly because Awk can manipulate files, usually constructed readable text line.

I say it is suitable for server because the log files, dump file (dump files), or any text format server ends dump to disk will become great, and you will have a large number of such files on each server. If you have experienced this situation - not like in the next Splunk or other equivalent tools have to analyze the situation a few G's in 50 different file servers, you would think to acquire and download all of these files and analyze them is a very bad thing.

I have personally experienced this situation. While some Erlang node is going to die and leave a 700MB to 4GB crash dump file (crash dump), or when I need a small personal server (called a VPS) to quickly view logs for a conventional mode.

In any case, Awk are not just used to find data (otherwise, grep or ack enough to use) - it also allows you to process the data and convert the data.

Code structure

Awk script code structure is very simple, is a series of patterns (pattern) and behavior (action):

# Comment
Pattern1 {ACTIONS;}

# Comment
Pattern2 {ACTIONS;}

# Comment
Pattern3 {ACTIONS;}

# Comment
Pattern4 {ACTIONS;}
Each line of the scanned document must be compared with each match a pattern, and a pattern matches only once. So, if I give a file with the following contents:

this is line 1

this is line 2

this is line 1 and line will Pattern1 match. If the match is successful, it will execute ACTIONS. Then this is line 1 and will Pattern2 match. If the match fails, it will jump to Pattern3 match, and so on.

Once all of the modes are matched over, this is line 2 will be matched with the same procedure. Other lines, too, until the entire file is read.

In short, this is the mode of operation Awk

type of data

Awk only two major data types: strings and numbers. Even so, Awk strings and numbers can also be converted to each other. String can be interpreted as a digital value and converts it into a digital value. If the string does not contain a number, it is converted to 0.

They are part of the code of ACTIONS you use the = operator to assign values to variables. We can at any time, anywhere to declare and use variables, you can also use uninitialized variables, when they default value is an empty string: "."

Finally, Awk array of types, and they are one-dimensional associative array dynamic. Their syntax is: var [key] = value. Awk can simulate multidimensional arrays, but no matter what, this is a big skill (big hack).


Mode can be used divided into three categories: regular expressions, Boolean expressions and special modes.

Regular expressions and Boolean expressions

Awk your use of regular expressions is relatively light. They are not under PCRE Awk (but gawk can support the library - it depends on the specific implementation, use awk!

-version view), however, for most of the demand for the use enough:

/ Admin / {...} # any line that contains 'admin'
/ ^ Admin / {...} # lines that begin with 'admin'
/ Admin $ / {...} # lines that end with 'admin'
/^[0-9.]+ / {...} # Lines beginning with series of numbers and periods
/ (POST | PUT | DELETE) / # lines that contain specific HTTP verbs
Note that the model does not capture specific groups (groups) to enable them to perform the ACTIONS section of code. Model is designed to match the content.

Boolean expressions similar to PHP or Javascript Boolean expressions. In particular, in awk can use && ( "and"), || ( "or"),! ( "Not") operator. You can find almost all traces of them in class C language. They can operate on conventional data.

With PHP and Javascript are more similar properties comparison operator, ==, it performs fuzzy matching (fuzzy matching). So "23" string is equal to 23, "23" == 23 expression returns true. ! = Operator in the same awk in use, and do not forget the other common operators:>, <,> =, and <=.

You can also mix them: regular expressions and Boolean expressions can be used together. / Admin / || debug == true this usage is legal, and in the face contains "admin" word line or debug variable is equal to true the expression will be a successful match.

Note that if you have a specific string or variable to the regular expression matching, and ~! ~ Operator is what you want. So use them: string ~ / regex / and string ~ / regex /!.

Also note that all patterns are just optional. Awk contains the following script:


Each line of the input will be performed simply ACTIONS.

Special mode

In Awk there are some special mode, but not many.

The first one is BEGIN, only the file before it entered into all matching rows. This is the main place you can initialize your script variables and all kinds of state.

Another is the END. As you might have guessed, it will have been dealt with at all the input match. This allows you to clean up some of the work and the final output before exiting.

The last category mode, make it a bit difficult to classify. It is between the variables and special values, we usually call them fields (Field). Also worthy.


Use intuitive example can better explain the domain:

# According to the following line
# $ 1 $ 2 $ 3
# 00:34:23 GET /foo/bar.html
# _____________ _____________ /
# $ 0

# Hack attempt?
/admin.html$/ && $ 2 == "DELETE" {
print "Hacker Alert!";
Domain (default) separated by a space. $ 0 string field represents an entire row. $ 1 domain is the first piece of string (before any spaces), $ 2 after a domain, and so on.

An interesting fact (and in most cases is something we want to avoid), you can give the appropriate domain assignment to modify the appropriate rows. For example, if you run $ 0 in a block where = "HAHA THE LINE IS GONE", so now the next model will be modified to operate after the operation rather than the original line. Other domain variables are similar.


There are a bunch of available behavior (possible actions), but the most common and most useful behavior (in my experience) is:

{Print $ 0;}. # Prints $ 0 In this case, equivalent to 'print' alone
{Exit;} # ends the program
{Next;} # skips to the next line of input
{A = $ 1; b = $ 0} # variable assignment
{C [$ 1] = $ 2} # variable assignment (array)

else if (BOOLEAN) {ACTION}
else {ACTION}
{For (i = 1; i {For (item in c) {ACTION}}
These will be the main tool for your toolbox Awk, you can use them freely in your transaction log files and the like.

Awk where the variables are global variables. Whether you define what a given block in the variable, it is visible to other blocks, or even for each line are visible. This severely limits the size of your Awk script, otherwise they will lead to terrible results unmaintainable. Please write the script as small as possible.


You can use the following syntax to call the function:

{Somecall ($ 2)}

There are some limited built-in functions can be used, so I can give general document (regular documentation) for these functions.

User-defined functions equally simple:

# Function arguments are call-by-value
function name (parameter-list) {
ACTIONS; # same actions as usual

# Return is a valid keyword
function add1 (val) {
return val + 1;
Special Variables

In addition to conventional variable (global, can be used anywhere), there are a number of special variables whose role somewhat like configuration entry (configuration entries):

BEGIN {# Can be modified by the user
FS = ","; # Field Separator
RS = "n"; # Record Separator (lines)
OFS = ""; # Output Field Separator
ORS = "n"; # Output Record Separator (lines)
{# Can not be modified by the user
NF # Number of Fields in the current Record (line)
NR # Number of Records seen so far
ARGV / ARGC # Script Arguments
I can modify variables in BEGIN, because I prefer that to rewrite them. But the rewriting of these variables can be placed anywhere in the script and then take effect in the back row.


The above is the core content Awk language. I do not have a lot of examples, because I tend to use Awk to complete a quick one-time task.

But I still have some script files to carry, to deal with some things and testing. My favorite is a script to handle Erlang crash dump file, shaped like the following:

= Erl_crash_dump: 0.3
Tue Nov 18 02:52:44 2014
Slogan: init terminating in do_boot ()
System version: Erlang / OTP 17 [erts-6.2] [source] [64-bit] [smp: 8: 8] [async-threads: 10] [hipe] [kernel-poll: false]
Compiled: Fri Sep 19 03:23:19 2014
Atoms: 12167
= Memory
total: 19012936
processes: 4327912
processes_used: 4319928
system: 14685024
atom: 339441
atom_used: 331087
binary: 1367680
code: 8384804
ets: 382552
= Hash_table: atom_tab
size: 9643
used: 6949
= Allocator: instr
option m: false
option s: false
option t: false
= Proc: < 0.0.0>
State: Running
Name: init
Spawned as: otp_ring0: start / 2
Run queue: 0
Spawned by: []
Started: Tue Nov 18 02:52:35 2014
Message queue length: 0
Number of heap fragments: 0
Heap fragment data: 0
Link list: [< 0.3.0>, < 0.7.0>, < 0.6.0>]
Reductions: 29265
Stack + heap: 1598
OldHeap: 610
Heap unused: 656
OldHeap unused: 468
Memory: 18584
Program counter: 0x00007f42f9566200 (init: boot_loop / 2 + 64)
CP: 0x0000000000000000 (invalid)
= Proc: < 0.3.0>
State: Waiting
= Port: #Port <0.0>
Slot: 0
Connected: <0.3.0>
Links: < 0.3.0>
Port controls linked-in driver: efile
= Port: #Port <0.14>
Slot: 112
Connected: < 0.3.0>
Produces the following results:

$ Awk -f queue_fun.awk $ PATH_TO_DUMP
10641: io: wait_io_mon_reply / 2
12646: io: wait_io_mon_reply / 2
32991: io: wait_io_mon_reply / 2
2183837: io: wait_io_mon_reply / 2
730790: io: wait_io_mon_reply / 2
80194: io: wait_io_mon_reply / 2
This is a list of functions in Erlang processes running inside, they led mailboxe become very large. In this script:

# Parse Erlang Crash Dumps and correlate mailbox size to the currently running
# Function.
# Once in the procs section of the dump, all processes are displayed with
# = Proc: < 0.M.N> followed by a list of their attributes, which include the
# Message queue length and the program counter (what code is currently
# Executing).
# Run as:
# $ Awk -v threshold = $ THRESHOLD -f queue_fun.awk $ CRASHDUMP
# Where $ THRESHOLD is the smallest mailbox you want inspects. Default value
# Is 1000.
if (threshold == "") {
threshold = 1000 # default mailbox size
procs = 0 # are we in the = procs entries?
print "======================================"

# Only bother with the = proc:. Entries Anything else is useless.
procs == 0 && / ^ = proc / {procs = 1} # entering the = procs entries
procs == 1 && / ^ = / &&! / ^ = proc / {exit 0} # we're done

# Message queue length: 1210
# 1234
/ ^ Message queue length: / && $ 4> = threshold {flag = 1; ct = $ 4}
/ ^ Message queue length: / && $ 4
# Program counter: 0x00007f5fb8cb2238 (io: wait_io_mon_reply / 2 + 56)
# 123456
flag == 1 && / ^ Program counter: / {print ct ":", substr ($ 4,2)}
You did not keep up with the idea? If you keep up, you already know Awk. Congratulations!
- Spark On YARN cluster installation deployment (Server)
- 10 Linux in the passwd command examples (Linux)
- Linux 101 hack book reading notes (Linux)
- Upgrading KDE Plasma 5.3 in Ubuntu 15.04 (Linux)
- Oracle data row split multiple lines (Database)
- Ubuntu users install the video driver Nvidia Driver 334.21 (Linux)
- Linux Systems Getting Started Learning: Configuration PCI passthrough on a virtual machine (Linux)
- Ubuntu 14.10 install KDE Plasma 5.2 (Linux)
- grep command Series: grep command to search for multiple words (Linux)
- Https (SSL / TLS) Detailed principles (Server)
- Linux system security (Linux)
- xCAT line installation on CentOS 6.X (Linux)
- Installation and use Docker under Ubuntu 14.04 (Linux)
- Puppet 3.x installed on Debian 7 (Server)
- TWiki LDAP error appears the problem is solved (Linux)
- Python data types summary (Programming)
- Shell Scripting early experience (Programming)
- Oracle users to automatically increase the partition table (Database)
- Sort sql MySQL 5.6 upgrade slow Cause Analysis (Database)
- CentOS 6.5 x86_64 system customized automated deployment (Linux)
  CopyRight 2002-2016 newfreesoft.com, All Rights Reserved.