Home IT Linux Windows Database Network Programming Server Mobile  
  Home \ Linux \ Linux Powerful command Awk Introduction     - 2016, the new Node project Precautions (Programming)

- Linux Disk and File Management (Linux)

- Linux iptables firewall settings (Linux)

- Java reflection Introduction (Programming)

- Ubuntu 14.04 installation configuration GNOME Classic interface (Linux)

- Linux SVN account password to save your settings (Linux)

- Linux operating system security tools of the Logs (Linux)

- Linux system security configuration (Linux)

- MySQL5.7 implement virtual column expression index (Database)

- ssh using scp: / directory: Permission denied (Server)

- Axel install plug under CentOS 5/6 acceleration yum downloads (Linux)

- Build RubyMine + Ruby On Rails + MySQL development environment under Windows (Server)

- Online booking shocking information leakage risk, you dare book the tickets online? (Linux)

- Proxmox VE implement KVM OpenVZ virtualization cloud computing (Server)

- Oracle delete table space prompted ORA-00604 and ORA-38301 (Database)

- How to manage KVM virtual environments with command-line tools in Linux (Server)

- Linux iptables firewall and vsftpd to resolve the issue (Linux)

- CRF ++ Linux use (Linux)

- Protect your files, modify the Linux value Umask (Linux)

- Linux regex awk Comments (Linux)

  Linux Powerful command Awk Introduction
  Add Date : 2017-08-31      
  What is Awk

Awk is a small programming language and command-line tools. (Whose name is derived from its founder Alfred Aho, Peter Weinberger, and the first letter of the last name of Brian Kernighan). It is ideal for log processing on the server, mainly because Awk can manipulate files, usually constructed readable text line.

I say it is suitable for server because the log files, dump file (dump files), or any text format server ends dump to disk will become great, and you will have a large number of such files on each server. If you have experienced this situation - not like in the next Splunk or other equivalent tools have to analyze the situation a few G's in 50 different file servers, you would think to acquire and download all of these files and analyze them is a very bad thing.

I have personally experienced this situation. While some Erlang node is going to die and leave a 700MB to 4GB crash dump file (crash dump), or when I need a small personal server (called a VPS) to quickly view logs for a conventional mode.

In any case, Awk are not just used to find data (otherwise, grep or ack enough to use) - it also allows you to process the data and convert the data.

Code structure

Awk script code structure is very simple, is a series of patterns (pattern) and behavior (action):

# Comment
Pattern1 {ACTIONS;}

# Comment
Pattern2 {ACTIONS;}

# Comment
Pattern3 {ACTIONS;}

# Comment
Pattern4 {ACTIONS;}
Each line of the scanned document must be compared with each match a pattern, and a pattern matches only once. So, if I give a file with the following contents:

this is line 1

this is line 2

this is line 1 and line will Pattern1 match. If the match is successful, it will execute ACTIONS. Then this is line 1 and will Pattern2 match. If the match fails, it will jump to Pattern3 match, and so on.

Once all of the modes are matched over, this is line 2 will be matched with the same procedure. Other lines, too, until the entire file is read.

In short, this is the mode of operation Awk

type of data

Awk only two major data types: strings and numbers. Even so, Awk strings and numbers can also be converted to each other. String can be interpreted as a digital value and converts it into a digital value. If the string does not contain a number, it is converted to 0.

They are part of the code of ACTIONS you use the = operator to assign values to variables. We can at any time, anywhere to declare and use variables, you can also use uninitialized variables, when they default value is an empty string: "."

Finally, Awk array of types, and they are one-dimensional associative array dynamic. Their syntax is: var [key] = value. Awk can simulate multidimensional arrays, but no matter what, this is a big skill (big hack).


Mode can be used divided into three categories: regular expressions, Boolean expressions and special modes.

Regular expressions and Boolean expressions

Awk your use of regular expressions is relatively light. They are not under PCRE Awk (but gawk can support the library - it depends on the specific implementation, use awk!

-version view), however, for most of the demand for the use enough:

/ Admin / {...} # any line that contains 'admin'
/ ^ Admin / {...} # lines that begin with 'admin'
/ Admin $ / {...} # lines that end with 'admin'
/^[0-9.]+ / {...} # Lines beginning with series of numbers and periods
/ (POST | PUT | DELETE) / # lines that contain specific HTTP verbs
Note that the model does not capture specific groups (groups) to enable them to perform the ACTIONS section of code. Model is designed to match the content.

Boolean expressions similar to PHP or Javascript Boolean expressions. In particular, in awk can use && ( "and"), || ( "or"),! ( "Not") operator. You can find almost all traces of them in class C language. They can operate on conventional data.

With PHP and Javascript are more similar properties comparison operator, ==, it performs fuzzy matching (fuzzy matching). So "23" string is equal to 23, "23" == 23 expression returns true. ! = Operator in the same awk in use, and do not forget the other common operators:>, <,> =, and <=.

You can also mix them: regular expressions and Boolean expressions can be used together. / Admin / || debug == true this usage is legal, and in the face contains "admin" word line or debug variable is equal to true the expression will be a successful match.

Note that if you have a specific string or variable to the regular expression matching, and ~! ~ Operator is what you want. So use them: string ~ / regex / and string ~ / regex /!.

Also note that all patterns are just optional. Awk contains the following script:


Each line of the input will be performed simply ACTIONS.

Special mode

In Awk there are some special mode, but not many.

The first one is BEGIN, only the file before it entered into all matching rows. This is the main place you can initialize your script variables and all kinds of state.

Another is the END. As you might have guessed, it will have been dealt with at all the input match. This allows you to clean up some of the work and the final output before exiting.

The last category mode, make it a bit difficult to classify. It is between the variables and special values, we usually call them fields (Field). Also worthy.


Use intuitive example can better explain the domain:

# According to the following line
# $ 1 $ 2 $ 3
# 00:34:23 GET /foo/bar.html
# _____________ _____________ /
# $ 0

# Hack attempt?
/admin.html$/ && $ 2 == "DELETE" {
print "Hacker Alert!";
Domain (default) separated by a space. $ 0 string field represents an entire row. $ 1 domain is the first piece of string (before any spaces), $ 2 after a domain, and so on.

An interesting fact (and in most cases is something we want to avoid), you can give the appropriate domain assignment to modify the appropriate rows. For example, if you run $ 0 in a block where = "HAHA THE LINE IS GONE", so now the next model will be modified to operate after the operation rather than the original line. Other domain variables are similar.


There are a bunch of available behavior (possible actions), but the most common and most useful behavior (in my experience) is:

{Print $ 0;}. # Prints $ 0 In this case, equivalent to 'print' alone
{Exit;} # ends the program
{Next;} # skips to the next line of input
{A = $ 1; b = $ 0} # variable assignment
{C [$ 1] = $ 2} # variable assignment (array)

else if (BOOLEAN) {ACTION}
else {ACTION}
{For (i = 1; i {For (item in c) {ACTION}}
These will be the main tool for your toolbox Awk, you can use them freely in your transaction log files and the like.

Awk where the variables are global variables. Whether you define what a given block in the variable, it is visible to other blocks, or even for each line are visible. This severely limits the size of your Awk script, otherwise they will lead to terrible results unmaintainable. Please write the script as small as possible.


You can use the following syntax to call the function:

{Somecall ($ 2)}

There are some limited built-in functions can be used, so I can give general document (regular documentation) for these functions.

User-defined functions equally simple:

# Function arguments are call-by-value
function name (parameter-list) {
ACTIONS; # same actions as usual

# Return is a valid keyword
function add1 (val) {
return val + 1;
Special Variables

In addition to conventional variable (global, can be used anywhere), there are a number of special variables whose role somewhat like configuration entry (configuration entries):

BEGIN {# Can be modified by the user
FS = ","; # Field Separator
RS = "n"; # Record Separator (lines)
OFS = ""; # Output Field Separator
ORS = "n"; # Output Record Separator (lines)
{# Can not be modified by the user
NF # Number of Fields in the current Record (line)
NR # Number of Records seen so far
ARGV / ARGC # Script Arguments
I can modify variables in BEGIN, because I prefer that to rewrite them. But the rewriting of these variables can be placed anywhere in the script and then take effect in the back row.


The above is the core content Awk language. I do not have a lot of examples, because I tend to use Awk to complete a quick one-time task.

But I still have some script files to carry, to deal with some things and testing. My favorite is a script to handle Erlang crash dump file, shaped like the following:

= Erl_crash_dump: 0.3
Tue Nov 18 02:52:44 2014
Slogan: init terminating in do_boot ()
System version: Erlang / OTP 17 [erts-6.2] [source] [64-bit] [smp: 8: 8] [async-threads: 10] [hipe] [kernel-poll: false]
Compiled: Fri Sep 19 03:23:19 2014
Atoms: 12167
= Memory
total: 19012936
processes: 4327912
processes_used: 4319928
system: 14685024
atom: 339441
atom_used: 331087
binary: 1367680
code: 8384804
ets: 382552
= Hash_table: atom_tab
size: 9643
used: 6949
= Allocator: instr
option m: false
option s: false
option t: false
= Proc: < 0.0.0>
State: Running
Name: init
Spawned as: otp_ring0: start / 2
Run queue: 0
Spawned by: []
Started: Tue Nov 18 02:52:35 2014
Message queue length: 0
Number of heap fragments: 0
Heap fragment data: 0
Link list: [< 0.3.0>, < 0.7.0>, < 0.6.0>]
Reductions: 29265
Stack + heap: 1598
OldHeap: 610
Heap unused: 656
OldHeap unused: 468
Memory: 18584
Program counter: 0x00007f42f9566200 (init: boot_loop / 2 + 64)
CP: 0x0000000000000000 (invalid)
= Proc: < 0.3.0>
State: Waiting
= Port: #Port <0.0>
Slot: 0
Connected: <0.3.0>
Links: < 0.3.0>
Port controls linked-in driver: efile
= Port: #Port <0.14>
Slot: 112
Connected: < 0.3.0>
Produces the following results:

$ Awk -f queue_fun.awk $ PATH_TO_DUMP
10641: io: wait_io_mon_reply / 2
12646: io: wait_io_mon_reply / 2
32991: io: wait_io_mon_reply / 2
2183837: io: wait_io_mon_reply / 2
730790: io: wait_io_mon_reply / 2
80194: io: wait_io_mon_reply / 2
This is a list of functions in Erlang processes running inside, they led mailboxe become very large. In this script:

# Parse Erlang Crash Dumps and correlate mailbox size to the currently running
# Function.
# Once in the procs section of the dump, all processes are displayed with
# = Proc: < 0.M.N> followed by a list of their attributes, which include the
# Message queue length and the program counter (what code is currently
# Executing).
# Run as:
# $ Awk -v threshold = $ THRESHOLD -f queue_fun.awk $ CRASHDUMP
# Where $ THRESHOLD is the smallest mailbox you want inspects. Default value
# Is 1000.
if (threshold == "") {
threshold = 1000 # default mailbox size
procs = 0 # are we in the = procs entries?
print "======================================"

# Only bother with the = proc:. Entries Anything else is useless.
procs == 0 && / ^ = proc / {procs = 1} # entering the = procs entries
procs == 1 && / ^ = / &&! / ^ = proc / {exit 0} # we're done

# Message queue length: 1210
# 1234
/ ^ Message queue length: / && $ 4> = threshold {flag = 1; ct = $ 4}
/ ^ Message queue length: / && $ 4
# Program counter: 0x00007f5fb8cb2238 (io: wait_io_mon_reply / 2 + 56)
# 123456
flag == 1 && / ^ Program counter: / {print ct ":", substr ($ 4,2)}
You did not keep up with the idea? If you keep up, you already know Awk. Congratulations!
- Spring multi data source configuration (Programming)
- Sqoop data export import command (Database)
- To setup a ftp server under Linux (Server)
- Three kinds of implementation model of the Linux thread history (Programming)
- Java memory analysis tool uses detailed MAT (Programming)
- Compression software on a simple comparison of zip and gz (Linux)
- Talk about Java EE Learning (Programming)
- Linux landing problem (Linux)
- tar command to extract a file error: stdin has more than one entry (Linux)
- Chrome plug-in management, online-offline installation, part of the plug presentations (Linux)
- Netcat Example (Linux)
- Ubuntu prompt / lack of boot space solutions (Linux)
- Https (SSL / TLS) Detailed principles (Server)
- Oracle GoldenGate Installation and Configuration Tutorial Introduction (Database)
- Encounter ORA-00600: internal error code, arguments: [4194] ORA-1552 (Database)
- Linux environment variable settings methods and differences (Linux)
- CentOS modify yum update source (Linux)
- Hadoop 0.23 compile common errors (Server)
- Improve WordPress performance (Server)
- Installation GitLab appears ruby_block supervise_redis_sleep action run (Linux)
  CopyRight 2002-2016 newfreesoft.com, All Rights Reserved.