Home IT Linux Windows Database Network Programming Server Mobile  
           
  Home \ Linux \ Linux Powerful command Awk Introduction     - PostgreSQL 9.4.3 Installation and Configuration under CentOS 6.5 (Database)

- Java and C / C ++ data conversion when network communication (Programming)

- MySQL time field based partitioning scheme summary (Database)

- Oracle create a temporary table space group (Database)

- Partition and file system under Linux (Linux)

- Questions about Linux compiler u-boot (Programming)

- Linux installation skynet issue summary (Linux)

- DRBD-based installation and configuration of CentOS 6.5 (Server)

- Linux yum command Detailed (Linux)

- C ++ class implementation date operator overloading (Programming)

- How to find on Linux and delete duplicate files: FSlint (Linux)

- Python2.7.7 source code analysis (Programming)

- To record command input under Linux (Linux)

- CentOS network configuration 7, and set the host name and IP-bound problems (Linux)

- TWiki LDAP error appears the problem is solved (Linux)

- Linux keyboard recording script (Linux)

- Oracle 11g to create a second instance on Linux (Database)

- Ubuntu is not in the sudoers file problem solving (Linux)

- C ++ function object (Programming)

- How to install PlayOnLinux 4.2.5 under Ubuntu 14.04 / 12.04 (Linux)

 
         
  Linux Powerful command Awk Introduction
     
  Add Date : 2017-08-31      
         
       
         
  What is Awk

Awk is a small programming language and command-line tools. (Whose name is derived from its founder Alfred Aho, Peter Weinberger, and the first letter of the last name of Brian Kernighan). It is ideal for log processing on the server, mainly because Awk can manipulate files, usually constructed readable text line.

I say it is suitable for server because the log files, dump file (dump files), or any text format server ends dump to disk will become great, and you will have a large number of such files on each server. If you have experienced this situation - not like in the next Splunk or other equivalent tools have to analyze the situation a few G's in 50 different file servers, you would think to acquire and download all of these files and analyze them is a very bad thing.

I have personally experienced this situation. While some Erlang node is going to die and leave a 700MB to 4GB crash dump file (crash dump), or when I need a small personal server (called a VPS) to quickly view logs for a conventional mode.

In any case, Awk are not just used to find data (otherwise, grep or ack enough to use) - it also allows you to process the data and convert the data.

Code structure

Awk script code structure is very simple, is a series of patterns (pattern) and behavior (action):

# Comment
Pattern1 {ACTIONS;}

# Comment
Pattern2 {ACTIONS;}

# Comment
Pattern3 {ACTIONS;}

# Comment
Pattern4 {ACTIONS;}
Each line of the scanned document must be compared with each match a pattern, and a pattern matches only once. So, if I give a file with the following contents:

this is line 1

this is line 2

this is line 1 and line will Pattern1 match. If the match is successful, it will execute ACTIONS. Then this is line 1 and will Pattern2 match. If the match fails, it will jump to Pattern3 match, and so on.

Once all of the modes are matched over, this is line 2 will be matched with the same procedure. Other lines, too, until the entire file is read.

In short, this is the mode of operation Awk

type of data

Awk only two major data types: strings and numbers. Even so, Awk strings and numbers can also be converted to each other. String can be interpreted as a digital value and converts it into a digital value. If the string does not contain a number, it is converted to 0.

They are part of the code of ACTIONS you use the = operator to assign values to variables. We can at any time, anywhere to declare and use variables, you can also use uninitialized variables, when they default value is an empty string: "."

Finally, Awk array of types, and they are one-dimensional associative array dynamic. Their syntax is: var [key] = value. Awk can simulate multidimensional arrays, but no matter what, this is a big skill (big hack).

mode

Mode can be used divided into three categories: regular expressions, Boolean expressions and special modes.

Regular expressions and Boolean expressions

Awk your use of regular expressions is relatively light. They are not under PCRE Awk (but gawk can support the library - it depends on the specific implementation, use awk!

-version view), however, for most of the demand for the use enough:

/ Admin / {...} # any line that contains 'admin'
/ ^ Admin / {...} # lines that begin with 'admin'
/ Admin $ / {...} # lines that end with 'admin'
/^[0-9.]+ / {...} # Lines beginning with series of numbers and periods
/ (POST | PUT | DELETE) / # lines that contain specific HTTP verbs
Note that the model does not capture specific groups (groups) to enable them to perform the ACTIONS section of code. Model is designed to match the content.

Boolean expressions similar to PHP or Javascript Boolean expressions. In particular, in awk can use && ( "and"), || ( "or"),! ( "Not") operator. You can find almost all traces of them in class C language. They can operate on conventional data.

With PHP and Javascript are more similar properties comparison operator, ==, it performs fuzzy matching (fuzzy matching). So "23" string is equal to 23, "23" == 23 expression returns true. ! = Operator in the same awk in use, and do not forget the other common operators:>, <,> =, and <=.

You can also mix them: regular expressions and Boolean expressions can be used together. / Admin / || debug == true this usage is legal, and in the face contains "admin" word line or debug variable is equal to true the expression will be a successful match.

Note that if you have a specific string or variable to the regular expression matching, and ~! ~ Operator is what you want. So use them: string ~ / regex / and string ~ / regex /!.

Also note that all patterns are just optional. Awk contains the following script:

{ACTIONS}

Each line of the input will be performed simply ACTIONS.

Special mode

In Awk there are some special mode, but not many.

The first one is BEGIN, only the file before it entered into all matching rows. This is the main place you can initialize your script variables and all kinds of state.

Another is the END. As you might have guessed, it will have been dealt with at all the input match. This allows you to clean up some of the work and the final output before exiting.

The last category mode, make it a bit difficult to classify. It is between the variables and special values, we usually call them fields (Field). Also worthy.

area

Use intuitive example can better explain the domain:

# According to the following line
#
# $ 1 $ 2 $ 3
# 00:34:23 GET /foo/bar.html
# _____________ _____________ /
# $ 0

# Hack attempt?
/admin.html$/ && $ 2 == "DELETE" {
print "Hacker Alert!";
}
Domain (default) separated by a space. $ 0 string field represents an entire row. $ 1 domain is the first piece of string (before any spaces), $ 2 after a domain, and so on.

An interesting fact (and in most cases is something we want to avoid), you can give the appropriate domain assignment to modify the appropriate rows. For example, if you run $ 0 in a block where = "HAHA THE LINE IS GONE", so now the next model will be modified to operate after the operation rather than the original line. Other domain variables are similar.

behavior

There are a bunch of available behavior (possible actions), but the most common and most useful behavior (in my experience) is:

{Print $ 0;}. # Prints $ 0 In this case, equivalent to 'print' alone
{Exit;} # ends the program
{Next;} # skips to the next line of input
{A = $ 1; b = $ 0} # variable assignment
{C [$ 1] = $ 2} # variable assignment (array)

{If (BOOLEAN) {ACTION}
else if (BOOLEAN) {ACTION}
else {ACTION}
}
{For (i = 1; i {For (item in c) {ACTION}}
These will be the main tool for your toolbox Awk, you can use them freely in your transaction log files and the like.

Awk where the variables are global variables. Whether you define what a given block in the variable, it is visible to other blocks, or even for each line are visible. This severely limits the size of your Awk script, otherwise they will lead to terrible results unmaintainable. Please write the script as small as possible.

function

You can use the following syntax to call the function:

{Somecall ($ 2)}

There are some limited built-in functions can be used, so I can give general document (regular documentation) for these functions.

User-defined functions equally simple:

# Function arguments are call-by-value
function name (parameter-list) {
ACTIONS; # same actions as usual
}

# Return is a valid keyword
function add1 (val) {
return val + 1;
}
Special Variables

In addition to conventional variable (global, can be used anywhere), there are a number of special variables whose role somewhat like configuration entry (configuration entries):

BEGIN {# Can be modified by the user
FS = ","; # Field Separator
RS = "n"; # Record Separator (lines)
OFS = ""; # Output Field Separator
ORS = "n"; # Output Record Separator (lines)
}
{# Can not be modified by the user
NF # Number of Fields in the current Record (line)
NR # Number of Records seen so far
ARGV / ARGC # Script Arguments
}
I can modify variables in BEGIN, because I prefer that to rewrite them. But the rewriting of these variables can be placed anywhere in the script and then take effect in the back row.

Example

The above is the core content Awk language. I do not have a lot of examples, because I tend to use Awk to complete a quick one-time task.

But I still have some script files to carry, to deal with some things and testing. My favorite is a script to handle Erlang crash dump file, shaped like the following:

= Erl_crash_dump: 0.3
Tue Nov 18 02:52:44 2014
Slogan: init terminating in do_boot ()
System version: Erlang / OTP 17 [erts-6.2] [source] [64-bit] [smp: 8: 8] [async-threads: 10] [hipe] [kernel-poll: false]
Compiled: Fri Sep 19 03:23:19 2014
Taints:
Atoms: 12167
= Memory
total: 19012936
processes: 4327912
processes_used: 4319928
system: 14685024
atom: 339441
atom_used: 331087
binary: 1367680
code: 8384804
ets: 382552
= Hash_table: atom_tab
size: 9643
used: 6949
...
= Allocator: instr
option m: false
option s: false
option t: false
= Proc: < 0.0.0>
State: Running
Name: init
Spawned as: otp_ring0: start / 2
Run queue: 0
Spawned by: []
Started: Tue Nov 18 02:52:35 2014
Message queue length: 0
Number of heap fragments: 0
Heap fragment data: 0
Link list: [< 0.3.0>, < 0.7.0>, < 0.6.0>]
Reductions: 29265
Stack + heap: 1598
OldHeap: 610
Heap unused: 656
OldHeap unused: 468
Memory: 18584
Program counter: 0x00007f42f9566200 (init: boot_loop / 2 + 64)
CP: 0x0000000000000000 (invalid)
= Proc: < 0.3.0>
State: Waiting
...
= Port: #Port <0.0>
Slot: 0
Connected: <0.3.0>
Links: < 0.3.0>
Port controls linked-in driver: efile
= Port: #Port <0.14>
Slot: 112
Connected: < 0.3.0>
...
Produces the following results:

$ Awk -f queue_fun.awk $ PATH_TO_DUMP
MESSAGE QUEUE LENGTH: CURRENT FUNCTION
======================================
10641: io: wait_io_mon_reply / 2
12646: io: wait_io_mon_reply / 2
32991: io: wait_io_mon_reply / 2
2183837: io: wait_io_mon_reply / 2
730790: io: wait_io_mon_reply / 2
80194: io: wait_io_mon_reply / 2
...
This is a list of functions in Erlang processes running inside, they led mailboxe become very large. In this script:

# Parse Erlang Crash Dumps and correlate mailbox size to the currently running
# Function.
#
# Once in the procs section of the dump, all processes are displayed with
# = Proc: < 0.M.N> followed by a list of their attributes, which include the
# Message queue length and the program counter (what code is currently
# Executing).
#
# Run as:
#
# $ Awk -v threshold = $ THRESHOLD -f queue_fun.awk $ CRASHDUMP
#
# Where $ THRESHOLD is the smallest mailbox you want inspects. Default value
# Is 1000.
BEGIN {
if (threshold == "") {
threshold = 1000 # default mailbox size
}
procs = 0 # are we in the = procs entries?
print "MESSAGE QUEUE LENGTH: CURRENT FUNCTION"
print "======================================"
}

# Only bother with the = proc:. Entries Anything else is useless.
procs == 0 && / ^ = proc / {procs = 1} # entering the = procs entries
procs == 1 && / ^ = / &&! / ^ = proc / {exit 0} # we're done

# Message queue length: 1210
# 1234
/ ^ Message queue length: / && $ 4> = threshold {flag = 1; ct = $ 4}
/ ^ Message queue length: / && $ 4
# Program counter: 0x00007f5fb8cb2238 (io: wait_io_mon_reply / 2 + 56)
# 123456
flag == 1 && / ^ Program counter: / {print ct ":", substr ($ 4,2)}
You did not keep up with the idea? If you keep up, you already know Awk. Congratulations!
     
         
       
         
  More:      
 
- Fedora && Arch Linux - the most romantic thing to happen now (Linux)
- Ubuntu Server (Ubuntu 14.04 LTS 64-bit) installation libgdiplus 2.10.9 error solution (Linux)
- Linux System Getting Started Learning: Fix ImportError: No module named scapy.all (Linux)
- Testing Oracle 11g RMAN replicate database (Database)
- Linux pwd command learning experience (Linux)
- Ubuntu method for single-card dual-IP (Linux)
- CentOS7 install MySQL5.6.22 (Linux)
- Fedora 20, Fedora 19, CentOS 6 and RHEL6 users how to install Wine 1.7.15 (Linux)
- Getting the Linux shell flow control statements (Programming)
- Linux security configuration (Linux)
- Dynamic programming Android (Programming)
- The user how to install Notepadqq 0.41.0 under ubuntu and debian (Linux)
- Ubuntu comes with gedit editor to add Markdown preview widget (Linux)
- NFS-based services and service utilization Corosync DRBD high availability cluster configuration, respectively (Server)
- MongoDB 2.6 deployment replica set + partitions (Database)
- Unix average load average load calculation method (Server)
- Nginx1.8 version upgrade method AMH4.2 Free manually compile (Server)
- Use ldap implement Windows Remote Desktop Ubuntu Linux (Linux)
- Objective-C basic program structure (Programming)
- Java Set and List in the relationship and difference (Programming)
     
           
     
  CopyRight 2002-2016 newfreesoft.com, All Rights Reserved.