GNU file utilities

 

Short:

The previous article in this series (Fundamentele UNIX commando’s) gave a general overview of Linux. It was an introduction to the Linux elements, to learn the basic skills and manage the operating system, but the user will want to learn the usual set of Unix commands. Using these commands and the shell you can achieve very efficient file and system. This article addresses these advanced, although basic, device.

Introduction: the Unix way of working

Before the commands are described, the reader should know some facts about their history. Ken Thompson en Dennis Ritchie wilden, when they Unix at the beginning of the seventies developed, create an operating system to ease the life of programmers. They decided that the best way to achieve this was to define a few simple tools, which were extremely good in some specialized tasks. More complicated tasks could be performed by combining these tools, using the output from one as the input for the other.

This concept of sending information is performed via the standard input and output (screen and keyboard). Due to the existence of pipes and redirection (as described in the previous article) it is possible to combine commands.

It is very easy to show this on the basis of an example. A user writes:

$ who | grip pepe

who and grep are two separate programs, merged with the pipe “|”. who shows a list with every user at the time the computer is angelogd. The output is something like:

$ who

manolo tty1 Dec 22 13:15

pepe ps/2 Dec 22 14:36

root tty2 Dec 22 10:03

pepe ps/2 Dec 22 14:37

The output consists of 4 fields, tab-delimited. The fields are the username (login), the terminal to which someone is logged, and the date and time of the compound.

“grip pepe” for lines with the string “pepe”.

En de uitvoer is:

$ who | grip pepe

pepe ps/2 Dec 22 14:36

pepe ps/2 Dec 22 14:37

Perhaps you are more interested in something simpler. You can check the number of terminals being used in that moment by using the program wc.

wc is a character, Words and Rules. Now we only need to know the number of lines. Therefore we use the option -l (lines, MR).

$ who | wc -l

4

$ who | grip pepe | wc -l

2

In aggregate 4 people logged on, and pepe is logged in at two terminals.

If we check now for antonio:

$ who | grip antonio | wc -l

0

antonio is not logged.

 

The genesis of GNU utils

Richard Stallman, the founder of the GNU project, began a discussion about the control over the Unix OS. That control was at that time in the hands of a few large software companies, which its computer science again kept in a natural way to grow up. During his time at MIT (Massachusetts Institute of Technology), where he wrote the emacs editor, he developed an aversion to the fact that big commercial firms took his work to ask for money there. Faced, he decided to start a project, where the source code of the software was available to everybody. Dat was GNU. The long-term goal was to make a completely open-source operating system. The first steps were a new open-source version of emacs, a C compiler (gcc) and a few typical UNIX system tools. It is these tools are discussed in this article.

 

grep

Our first example showed the main functionality of grep. Now we will explain it in greater detail

De grondvorm grep is

$ grep [-options] pattern files

The most commonly used options (options) his:
-n shows the line number before the matched lines (useful for search in big files, and to know exactly where the match is located)
-c shows the number of matches found
-v search for non-matching lines (search for lines where the pattern (pattern) is not present)

The pattern (pattern) is a group of characters to be sought. If there is a space between state, The cartridge must be enclosed in double quotes (“) be put, to confusion between the pattern and the files to search (file) to prevent. For example

$ grep “Hello World” file.txt

If we are looking for strings including wildcards, apostrophe, quotes or slashes they must be escaped or (preceded by a backslash (\)) or placed between quotation marks, in order to avoid replacement by the shell.

$ grep *”\’\?\< file.txt

As a possible result:

This is a dodgy chain -> *”‘?<

 

Regular expressions

grep and other GNU utils are able to perform more advanced searches. This is possible through the use of regular expressions. Regular expressions are similar to wildcards in the shell, in the sense that they replace characters or groups of characters. Under the resource at the end of the article contains a link to an article explaining regular expressions in detail.
A few examples:

$ grep c.n

search for any occurrence of a string with c, followed by any character, followed by a t.

$ grep “[Bc]he”

search every call or prevents cell.

$ grep “[m-o]ata”

find those rules which mata, North of Oata voorkomen.

$ grep “[^m-o]ata”

Lines with a string ending in ata, but no m, n or o as their first letter.

$ grep “^Martin come”

Each line beginning with 'Martin come'. As ^ is out of brackets, it means the beginning of a line, not a negation of a group, as in the previous example.

$ grep “$ sleeping”

All lines that end with the string 'durmiendo'. $ stands for the end of the line.

$ grep “^ Caja San Fernando wins the league $”

Rules that exactly match what it says.

In order to prevent the special service of each of these characters, must be to put a backslash. For example:

$ grep “E\.T\.

search for the string 'E.T.'.

 

Find

This command is used to locate files. Another LinuxFocus article explains this off, and the best thing we can do is refer to it.

 

cut & paste

In UNIX information is usually stored in ASCII files with line-records, and fields delimited with some special character, usually a tabulation mark or a colon (:). A typical option is to select some fields from a file and join them into another file. For this task cut and paste.

Let's take as an example the file / etc / passwd, using the user information. It contains 7 fields, separated by “:”. The fields contain information about login name, the encrypted password, The User ID, the name, de home directory van de gebruiker, and the shell that he prefers.

Here is a typical piece from this file:

root:x:0:0:root:/root:/bin/bash

die:x:500:500:Manuel Muriel Cordero:/home/murie:/bin/bash

practice:x:501:501:User practices to Ksh:/home/practica:/bin/ksh

wizardi:x:502:502:Wizard para nethack:/home/wizard:/bin/bash

If we want to pair the user with their shells, we need to field 1 in 7 cut (cut):

$ cut -f1,7 -d: /etc/passwd

root:/bin/bash

die:/bin/bash

practice:/bin/ksh

wizard:/bin/bash

The -f option specifies the fields to be cut, and -d defines the separation stabbing (tab is standaard).

It is also possible to select a row of fields:

$ cut -f5-7 -d: /etc/passwd

root:/root:/bin/bash

Manuel Muriel Cordero:/home/murie:/bin/bash

User practices to Ksh:/home/practica:/bin/ksh

Wizard para nethack:/home/wizard:/bin/bash

All we exports'>’ sent to two different files, and we want to combine the output of both, then we can use the command paste:

$ paste output1 output2

root:/bin/bash:root:/root:/bin/bash

die:/bin/bash:Manuel Muriel Cordero:/home/murie:/bin/bash

practice:/bin/ksh:User practices for Ksk:/home/practica:/bin/ksh

wizard:/bin/bash:Wizard para nethack:/home/wizard:/bin/bash

 

sort

Let's assume that we / etc / passwd want to sort on the name field. To achieve this, we will use sort, the unix sorting tool.

$ sort -t: +4 /etc/passwd

die:x:500:500:Manuel Muriel Cordero:/home/murie:/bin/bash

practice:x:501:501:User practices to Ksh:/home/practica:/bin/ksh

wizard:x:502:502:Wizard para nethack:/home/wizard:/bin/bash

root:x:0:0:root:/root:/bin/bash

It is easy to see that the file is sorted, but in the order of the ASCII table. If we want to distinguish between head- en kleine letters, we can use the following:

$ sort -t: +4f /etc/passwd

die:x:500:500:Manuel Muriel Cordero:/home/murie:/bin/bash

root:x:0:0:root:/root:/bin/bash

practice:x:501:501:User practices to Ksh:/home/practica:/bin/ksh

wizard:x:502:502:Wizard para nethack:/home/wizard:/bin/bash

-t is the option to select the separation stabbing for the fields, +4 The number of fields that must be beaten before ordering, and f means that there needs to be sorted without distinguishing between head- en kleine letters.

You can still do much more complicated sort. Example, we can in a first step to sort the favorite shell, and in a second step on the name:

$ sort -t: +6r +4f /etc/passwd

practice:x:501:501:User practices to Ksh:/home/practica:/bin/ksh

die:x:500:500:Manuel Muriel Cordero:/home/murie:/bin/bash

root:x:0:0:root:/root:/bin/bash

wizard:x:502:502:Wizard para nethack:/home/wizard:/bin/bash

Pair: you have a file with the names of people you lend money and the amount that you borrowed them. Naam is 'deudas.txt':

Son Goku:23450

Son Gohan:4570

Picolo:356700

Ranma 1/2:700

If you're the first one you 'must visit’ To learn, you need a sorted list:
Just type:

$ sort +1 debts

Ranma 1/2:700

Son Gohan:4570

Son Goku:23450

Picolo:356700

However, this is not the desired result, because the number of characters is not everywhere the same. The solution is to make use of the 'n’ option:

$ sort + 1n debts

Picolo:356700

Son Goku:23450

Son Gohan:4570

Ranma 1/2:700

Basic options for sort are:
+nm jumps over the first n fields and the next m characters before begin the sort.
-nm to stop the sorting process, if the m-th character of the n-th field is achieved.

You following parameters can be used:
-b jumps over leading whitespaces
-d dictionaire sort (only letters, numbers and spaces)
-f does not distinguish between head- en kleine letters
-n sort numerically
-r reverse order

 

wc

As we have seen before, wc is a character, word- and line counter. Default output contains the number of lines, words and characters in the input file (the input files).

The output can be determined using the options:

-l only lines (lines)
-w just words
-c only characters (characters)

 

Comparison tools: cmp, comm, diff

Sometimes we need to know the differences between two versions of the same file. This is primarily used in programming, when several people work on the same project, and, if the source code (can) change. To find the difference between one and the other version, you can use these tools.

cmp is de eenvoudigste. It compares two files and locates the place where the first difference appears (it gives a number for the sign, and the number of the rule.)

$ cmp old new

old new differ: char 11234, line 333

comm is a bit more advanced. The output delivers 3 columns. The first one contains the unique lines of the first file, the second one contains the unique lines of the second file, and the third contains the corresponding rules. Numerical parameters allow removal of some of these columns.
-1, -2 in -3 indicate, respectively, the first, second and / or third column does not have to be displayed. The example below shows only those rules that are unique to the first file and the common rules.

$ comm -2 old new

Last, but certainly not the least of the three, is er diff. This is an indispensable tool for programming projects. If you've already downloaded a kernel to compile, you know that you can choose the source code of the new one or the patch for the previous version, whereby the latter is smaller. This patch has a diff suffix, which means it's a diff output. This tool is editor commands (we, rcs) use it to create identical files. This also applies to directories and the archives holding them in. The use is obvious: you have to download less source code (only changes), you apply the patch, en je compileert. Without parameters, the output specifies how the change should be applied, so that the former is equal to the second, met vi commando’s.

$ diff old new

3c3

< The Hobbit

> The Lord of the Rings

78a79,87

>Three Rings for the Elven-kings under the sky,

>Seven for the Dwarf-lords in their halls of stone,

>Nine for Mortal Men doomed to die,

>One for the Dark Lord on his dark throne

>In the Land of Mordor where the Shadows lie.

>One Ring to rule them all, One Ring to find them,

>One Ring to bring them all and in the darkness bind them

>In the Land of Mordor where the Shadows lie.

3c3 means that at line 3 Three rules must be changed, whereby “The Hobbit” has to be removed, and replaced by “The Lord of the Rings”. 78a79,87 means you must insert new lines, Rule 79 to 87.

 

uniq

uniq removes duplicates. If we want to know the people actually connected to the computer, we must use the commands who and cut.

$ who | cut -f1 -d’ ‘

root

die

die

practice

However, the output is not quite right. We need the second time that user occurs remove murie. This means

$ who | cut -f1 -d’ ‘ | sort | uniq

die

practice

root

The -d option’ ‘ means that the separation field is a space, because the output from who use that character instead of the tab.

uniq compares only consecutive lines. In our case the 2 time “die” directly after each other, but it also could have been different. It is therefore a good idea to sort the output before you sort used uniq.

 

but

sed is one of the most peculiar Unix tools. It stands for “sTREAM editor”. When editing text in the usual way, the program accepts interactively change that indicates the user. sed allows us to create small shell scripts, gelijkend op de batch files in MS-DOS. So it gives us the ability to modify the contents of a file without user interaction. The capabilities of the editor are great, but if we were going deeper into the subject, This article would be too long. That's why we go for a brief introduction, and interested parties can study the man and info pages.

sed is usually invoked as:

$ But 'command’ files

Take as example a file, in which we each occurrence of “Manolo” like to replace “Fernando”. This goes:

$ sed ‘s/Manolo/Fernando/g’ file

Via standard output you get the changed file. If you want to keep the results, Redirect With “>”.

Many users will ordinary search & recognize replace vi command. In fact, most of them are “:” commando’s (which call to ex) sed commando’s.

Usually, sed instructions from one or two addresses (To select lines) and the command to be carried out. The address could be a line, some rules, or a pattern.
The most common commands are:

Command Action

——– —–

a\ voeg een regel toe na de geadresseerde regel in de invoer

c\ verander de geadresseerde regels, writing the line

d deletes the line(s)

g changed the pattern everywhere, instead of only the first

time it occurs

i\ voeg regels toe na de geadresseerde regels

p print the current line, even if the -n option is used

q stop (leaving the program) if the addressed line is reached

r file read a file, and add the contents to the output

s/one/two vervang string “one” door string “two”

w file copies the actual line to a different file

= Print line number

! command applies the command to the actual line

Using sed, You can specify which rule(s) To edit your:

$ But 3d’ file

will delete the third line of the file

$ but '2,4s / e / # /’ file

Will be the first time that in the rules 2 to 4 a “and” prevents, this is replaced by “#”

Lines containing a string can be selected by regular expressions, as described above, using. For example

$ But '/[Qq]ueen / d’ songs

each line will delete where the word “Queeen” of “queen” in occurs.

It is easy to delete empty lines from a file using patterns (patterns)

$ but the '/ ^ $ / d’ file

However, this does not remove lines with spaces. To achieve that, we need to expand a little pattern

$ but '/ ^ * $ / d’ file

Where it “*” sign indicates that the previous character, ” ” therefore, the space in this case, x may occur several times.

$ But '/ initMenu / a

> the add text’ file.txt

This example searches for the line containing the string InitMenu” prevents, and after adding a new line. The example works, as shown, only with a bash or sh shell. Je tikt tot a\, then press enter (return) en tik je de rest.

Tcsh new rules inside quotes in a different way. Therefore you must use a double backslash:

$ sed ‘/InitMenu/a\\

? the add text’ file.txt

The ? the shell sign, net als de > in the bash example.

 

awk

A reading best: awk. The unusual name comes from the names of the original developers: Alfred Aho, Peter Weinberger en Brian Kernighan.

The awk program is one of the most interesting among Unix utilities. It is an evolved and complex tool, that allows, from the command line a lot to perform different actions.

It should be noticed that awk and sed are key pieces of the more complex shell scripts. What you can do with it, without C or any other compiled language, is impressive. The installation of the Slackware Linux distribution for example,, and many CGI web programs are just shell scripts.

Today, the tools work from the command line is usually not performed; they are obsolete by the advent of the window environments, and with the arrival of PERL many shell scripts became substituted by perl scripts. You might think the command line tools will be forgotten. However my own experience say that many applications can be done with a few lines in a shell script (a small database manager). That you can be very productive with the shell, at least if you use these commands and the shell knows well.

If you join the power of awk and sed you can do things quickly, that are usually a small database manager plus a spread sheet.

Take a bill, where you find the articles you bought, how many pieces of each, and their prices per product. We call this file “sales”:

oranges 5 250

pears 3 120

apples 2 360

It is a file with 3 Fields and Tab as the delimiter. Now you want to define a fourth field, with the total price per product.

$ awk ‘{total=$2*$3; print $0 , total }’ sales

oranges 6 250 1250

pears 3 120 360

apples 2 360 720

total is the variable which will contain the product of the values ​​stored in the second and third field. After calculation, the whole input line and the total fields.

awk is nearly a programming environment, particularly suitable to automate the work with information from text files. If you are interested in this tool, I encourage you to study them, using the man and info pages.

 

De shell scripts

Shell scripts are sequences of system commands, after having been stored in a file, can be carried out.

Shell scripts are similar to DOS batch files, but powerful. They allow users to make their own commands just combining existing commands.

Obviously accept shell scripts parameters. These are stored in the variables $0 (the name of the command or script), $1, $2, … to $9. All the parameters of the command can be approximated by $*.

Any text editor can create shell scripts. In order to execute a script run in:

$ sh shell-script

of, even better, you can give execution rights

$ chmod 700 shell-script

and execute just typing the name:

$ shell-script

We will finish here this article and the discussion about shell scripts, which is postponed to the future. The next article will introduce the most common Unix text editors: we & emacs. Every Linux user should know them well.

Leave a Reply

Your email address will not be published. Required fields are marked *