|
One, the text processing tools
grep, sed and awk text processing tools are, though they are text-processing tools but has its own advantages and disadvantages, a text-processing commands can not be completely replaced by another, otherwise we would not have three text processing commands a. But, comparatively speaking, sed and awk only more powerful, and has a separate language to introduce.
grep: text filter, if only the filter text, you can use grep, more efficient than many other high;
sed: Stream EDitor, stream editor, default processing mode only space, does not deal with raw data, if you're dealing with lines for processing, you can use sed;
awk: Display Report Generator, after formatting. If data processing needs to generate reports such information or data you're dealing with column-processed, it is best to use awk.
Two, awk can do some of the features
Text files text seen by the database records and fields of
Use variable operation database
Using arithmetic and string operators
Use common programming structures such as loops and conditionals
Students help formatting reports
Defined Functions
Execute unix commands from a script
Results processing unix command
Parameters smarter processed the command line
More easily handle multiple input streams
Third, the syntax
# Awk [options] 'script' file1 file2, ...
# Awk [options] 'PATTERN {action}' file1 file2, ..
1, option
-F Fs or --field-separator fs:
Specifying an input file off separator, fs is a string or a regular expression, such as -F:
#awk -F: '/ root / {print $ 1, $ NF}' / etc / passwd
#awk -F: '/ root / {print $ 1 $ NF}' / etc / passwd
#awk -F: '/ root / {print $ 1 $ NF}' / etc / passwd
#awk -F: '/ root / {print $ 1 "#" $ NF}' / etc / passwd
-v option to define variables before running the script that there can be called in the script BEGIN process;
2, the output of awk: print and printf
(1) print using the format:
print item1, item2, ...
Highlights:
a, separated with a comma between each item, and in a blank character to separate output;
b, item can be output to a string or numeric field in the current record (such as $ 1), variables, or awk expressions; value will be converted to a string, and then outputting;
c, print command behind the item may be omitted, then it functions as print $ 0, therefore, if you want to output a blank line, you need to use print "";
Note that in AWK, and $ represent fields, users do not need to add $ variable, which is different from the shell or Perl and AWK place! In the shell, without the time variable definition $, you need to use $ referencing again, but in Perl Whether definitions and references need to add $ (Perl, $ represents a scalar, while @ and% symbols Array and Hash variable).
Examples
(2), printf format used
printf format, item1, item2, ...
format format indicator% both begin with, followed by a character,
% C
Converted into digital ASCII, such as printf "% c", 67 results for C.
wKiom1NBKiCgOr_tAABvKrxuDWw453.jpg
% D,% i
Print a decimal integer, such as printf "% dn", 6.745 6 results.
% E,% E
Converting the digital scientific (exponent) symbols, such as printf "% 4.3en", 6745 results 6.745e + 03.
% F
In the floating-point representation of digital printing, such as printf "% 4.3f \ n", 6745 results 6745.0000000
wKiom1NBLAbTiZHiAABnpSu_Rzo057.jpg
% S
Print string, such as printf "% 10s \ n", 6745 the results of ten empty box 6745.
Flexible formats:
N $
Position indicator, position adjustable output string. printf "% s% s% s \ n", "I", "lOVE", "YOU" output: I LOVE YOU, we adjust the position, printf "% 3 $ s% 2 $ s% 1 $ s \ n "," YOU "," LOVE "," I ", the output is: I LOVE YOU
Modifiers
N: display width;
-: Left;
+: Align Right (can also be displayed with the sign of the value of positive and negative);
3, and the operating mode
(1), a pattern can be any of the following:
/ Regular expression /: Use wildcard expansion sets.
Relational expression: it can operate with the following operator tables relational operators may be comparing strings or numbers, such as $ 2>% 1 Select the second field is longer than the first field line.
Pattern matching expression: using operators ~ (match) and ~ (do not match)!.
Mode, mode: Specify a line range. This syntax does not include the BEGIN and END patterns.
BEGIN: let the user specify the action that occurred before the first input record is processed, usually set global variables here.
END: let a user action after the last input record is read occurred.
(1), operated by one or more commands, functions, expressions, separated by a newline or semicolon, and located within the braces. There are four main parts:
Variable or array assignment
Output command
Built-in functions
Control flow commands
4, variable
(1), awk built-in variable of variables
FS: When the field separator to read this document, the field delimiter
Newline Record separator used to input text message: RS
OFS: Output Filed Separator output field separator (default is a space)
ORS: Output Row Separator output record separator (default is a newline)
Note:
From $ 1, $ 2 up to $ NF, the whole line marked with $ 0, $ 0 is assigned if the new value, all of $ 1, $ 2 ... and NF will be recalculated. Similarly, if $ i is changed, $ 0 will be recalculated using the OFS.
(2), awk built-in data variables Variables
NR: The number of input records awk command the number of records processed; if there are multiple files, the file number of the plurality of rows in a unified process would count
NF: Number of the current record number field Field
FNR relative record number of the current file
ARGV array, save the command line string itself, such as awk '{print $ 0}' a.txt b.txt this command, ARGV [0] Save awk, ARGV [1] Save a.txt
The number of parameters ARGC awk command
Name FILENAME awk command file processed
Associative array ENVIRON current shell environment variables and their values
NR Usage
NF Usage (default separated by spaces)
FNR Usage
ARGV Usage
of two
FILENAME Usage
ENVIRON Usage
Note:
ARGV array of ARGV [0] .... ARGV [ARGC-1]. The first element is 0 instead of 1, which is the AWK-like array of different
ENVIROND array of useful interactive shell and AWK, the use ENVIRON [ "PARA_NAME"] to get the value of the environment variable $ PARA_NAME which quotes "" essential!
5, the standard output redirection
(1), output redirection
print items> output-file
print items >> output-file
print items | command
(2), special file descriptor:
/ Dev / stdin: standard input
/ Dev / sdtout: standard output
/ Dev / stderr: error output
/ Dev / fd / N: a specific file descriptors, such as / dev / stdin is equivalent to / dev / fd / 0;
Example:
# Awk -F "" '{printf "% -15s% i \ n", $ 1, $ 3> "/ dev / stderr"}' / etc / issue
# Awk -F "" '{printf "% -15s% i \ n", $ 1, $ 3> "/ dev / null"}' / etc / issue
wKiom1NBStnDFP2pAAExjxe_fhc563.jpg
6, awk operators:
(1), arithmetic operators:
-x: negative
+ X: numeric conversion;
x ^ y:
x ** y: th power
x * y: Multiplication
x / y: division
x + y:
x-y:
x% y:
(2), string operators:
Only one, and do not write for implementing string concatenation;
(3), the assignment operator:
=
+ =
- =
* =
/ =
% =
^ =
** =
++
-
Note that, if a = number mode, this time using / = / There may be a syntax error, should be / [=] / alternative;
(4), Boolean value
awk, any non-zero value or a non-empty strings are true, otherwise it is false;
(5), comparison operators:
x
x <= y True if x is less than or equal to y.
x> y True if x is greater than y.
x> = y True if x is greater than or equal to y.
x == y True if x is equal to y.
x! = y True if x is not equal to y.
x ~ y True if the string x matches the regexp denoted by y.
x! ~ y True if the string x does not match the regexp denoted by y.
subscript in array True if the array array has an element with the subscript subscript.
(6), logic operators between expressions:
&&
||
(7), conditional expression:
selector if-true-exp:? if-false-exp
if selector; then
if-true-exp
else
if-false-exp
fi
(8), the function call:
function_name (para1, para2)
7, control statements
(1), if-else
grammar:
if (condition) {then-body} else {[else-body]}
Example:
#awk '{if ($ 3 == 0) {print $ 1, "Adminitrator";} else {print $ 1, "Common User"}}' / etc / passwd
#awk -F: '{if ($ 1 == "root") print $ 1, "Admin"; else print $ 1, "Common User"}' / etc / passwd
#awk -F: '{if ($ 1 == "root") printf "% -15s:% s \ n", $ 1, "Admin"; else printf "% -15s:% s \ n", $ 1, " Common User "} '/ etc / passwd
#awk -F: -v sum = 0 '{if ($ 3> = 500) sum ++} END {print sum}' / etc / passwd
(2), while
grammar:
while (condition) {statement1; statment2; ...}
Example:
#awk -F: '{i = 1; while (i <= 3) {print $ i; i ++}}' / etc / passwd
#awk -F: '{i = 1; while (i <= NF) {if (length ($ i)> = 4) {print $ i}; i ++}}' / etc / passwd
#awk '{i = 1; while (i <= NF) {if ($ i> = 20000) print $ i; i ++}}' random.txt
# Random.txt content file is a bunch of random numbers.
(3), do-while loop executes at least once, regardless of the conditions are met or not
grammar:
do {statement1, statement2, ...} while (condition)
Example:
#awk 'BEGIN {
sum = 0;
i = 0;
do {
sum + = i;
i ++;
} While (i <= 100)
print sum;} '
(4), for
Syntax: for (variable assignment; condition; iteration process) {statement1, statement2, ...}
Example:
#awk -F: '{for (i = 1; i <= 3; i ++) {if (length ($ i)> = 8) {print $ i}}}' / etc / passwd
It can also be used for loop through the array elements:
grammar:
for (i in array) {statement1, statement2, ...}
Example:
#awk -F: '$ NF ~ / ^ $ / {BASH [$ NF] ++} END {for (A in BASH) {printf! "% -15s:% i \ n", A, BASH [A] }} '/ etc / passwd
(5), case
Syntax: switch (expression) {case VALUE or / REGEXP /: statement1, statement2, ... default: statement1, ...}
(6), break and continue
Commonly used in the loop or case statement
(7), next
Premature termination of processing on the Bank of the text, and then the next line; for example, the following command will display its ID number User odd:
Example:
# Awk -F: '{if ($ 3% 2 == 0) next; print $ 1, $ 3}' / etc / passwd
9, awk use arrays
(1), array
array [index-expression]
index-expression can use any string; Note that, if a set of data elements previously not present, when referring to his, awk automatically creates this element and is initialized to the empty string; therefore, to determine whether a data set the presence of an element, use index in array manner.
To traverse each element in the array, you need to use the following specific structure:
grammar
for (var in array) {statement1, ...}
Wherein, var used to reference array subscript, rather than the element value;
Example:
#netstat -ant | awk '/ ^ tcp / {++ S [$ NF]} END {for (a in S) print a, S [a]}'
(2), delete the array variable
Remove from the relationship between the array array index need to use delete command. Use the format:
delete array [index]
10, awk built-in functions
split (string, array [, fieldsep [, seps]])
Function: string string representation to fieldsep a delimiter separated, and save the results to the array separated after the name of the array; array subscript 1 from the beginning of the sequence;
Example:
# Netstat -ant | awk '/: 80 \> / {split ($ 5, clients, ":"); IP [clients [1]] ++} END {for (i in IP) {print IP [i], i}} '| sort -rn | head -50
# Netstat -tan | awk '/: 80 \> / {split ($ 5, clients, ":"); ip [clients [4]] ++} END {for (a in ip) print ip [a], a } '| sort -rn | head -50
# Df -lh | awk '/ ^ File / {split ($ 5, percent, "%"); if (percent [1]> = 20) {print $ 1}}!'
length ([string])
Function: Returns the string number of characters in the string;
substr (string, start [, length])
Function: string string substring, start from the beginning, taking a length; start from 1 count;
# Tail -10 / etc / passwd | awk -F: '{print substr ($ 1,1,6)}'
system (command)
Function: Execute system command and returns the result to the awk command
# Awk 'BEGIN {print system ( "ls -l")}'
wKiom1NBZNuA7K9FAAGeZuDxZ0c123.jpg
systime ()
Function: systime function returns from January 1, 1970 start to the current time (excluding the leap year) of the whole number of seconds
tolower (s)
Function: s in all lowercase letters
toupper (s)
Function: s in all uppercase letters
# Awk 'BEGIN {s = "acl"; print toupper (s)}' |
|
|
|