Linux command line - Working with text using grep and sed


Hi, I am Malathi Boggavarapu working at Volvo Group and i live in Gothenburg, Sweden. I have been working on Java since several years and had vast experience and knowledge across various technologies.

In this post we discuss about Linux commands grep and sed and the corresponding options that are helpful at work.

When working with text on a linux system, we often need to search for particular string in log file or in some code or in some configuration file. Well there is always an option to open the file and look through it but there is a better tool to do. That is grep command.

First let's create a text file with some text in it and we work on grep command and it's options using the text file content. I prefer to use Cygwin shell to execute the commands of Linux. But it is upto you to determine which shell you want to use.

Open the shell and create a text file with some content in it. I added the following text to the grep_demo.txt file.

Please Note: grep_demo.txt is used through out this course.








Create a file from Linux command line

I suggest you to download and install Cygwin because there is no need to install any kind of VirtualBox and Linux Operating system on it.  

Open the Cygwin shell and type the following command to create a file
vi grep_demo.txt

It opens an editor to write the content to a file. The file will be opened as read-only, so inorder to write some text, press INSERT button and start typing. Finally to save the file, type :wq and press Enter. You will be redirected to the command prompt of the shell.

So now the text file is created and it is named as grep_demo.txt. Now let's start working on grep command and it's otpions.


Grep command and it's options

Grep is one of the familiar tool that system administrators and developers need to be familiar with. 

man grep - Gives you the manual pages of grep which describes more about several different options available for it.
grep line grep_demo.txt - Shows the entire line that matches the text line




grep --color=auto line grep_demo.txt : Shows the matched text in color.





grep -n line grep_demo.txt : Shows the line number along with the matched text.





grep -C 1 line grep_demo.txt : -C option is used to know the lines surrounding the match. Both of these options can be helpful in configurtion files especially where you might have the same name for something but with a particular position and instance of it makes a difference.






Matching with grep is case sensitive but if you wan to make case-insensitive search you can use the following command

grep -i this grep_demo.txt - This will make the search case insensitive
grep -v li grep_demo.txt: This will search all the lines which DOES NOT match the search item. In our example the search item is li
sudo grep -i error /var/log/* - This will search for word error in all the files of var/log folder. But this command only search at the first level of the folder. It means it does not search in the sub-folders present inside the folder /var/log.
sudo grep -i error /var/log/* - This will search recursivly through out the folder var/log. It means it will search in sub-folders inside var/log folder. But it is very time intensive and resource intensive if you havelot of files.

Regular Expressions

The term Regular expression comes from Mathematical roots of computer science. They are actually used to find certain types of information in text files such as Dates, email addresses, phone numbers and any kind of information that fits a particular pattern. 

Example: You might see the email address malathi @example.c in email list and immediatly find that the address is not valid but the program sending an email using this list will compose a mail and send to it. And immediatly we get an error from email server saying that the address is not valid. Inorder to prevent such kind of errors we use regualr expression.

Regex Operators

Symbol                       Description
     .                        Represents one of any character
     ?                       Represents zero or one of whatever precedes it
     +                       Represents one or more of whatever precedes it
     *                       Represents zero or more of whatever precedes it
    ( )                      Used for grouping
    { }                      Used for Counting
     ^                       Beginning of line
     $                       End of line
   abc                     Represents the text abc
   [a-z]                   Represents one of any character in the set a-z
  [a-zA-Z]             Represents any Latin character(lower and upper case)
  [0-9]                    Represents one of any number 0-9
  
Now let's take some examples

grep --color=auto -E "t." grep_demo.txt - This displays the text which matches one letter after character t







grep --color=auto -E ".t" grep_demo.txt - This displays the text which matches one letter before character t







grep --color=auto -E "t.*" grep_Demo.txt - This displays the text that starts with t until the end of newline








grep --color=auto -E "t.*t" grep_demo.txt - This displays one character after t followed by zero or more characters ending with t







grep --color=auto -E "[t -z]" grep_demo.txt - This will match any of the characters between t and z








grep --color=auto -E "[t-z]{2}" grep_demo.txt  - This will match group of characters between t and z but with a particular size. See output below






That's a quick look. Regular expressions are huge topic and they become lot more complex and lot more powerful.


sed - Stream Editor

It is a tool that allows us to manipulate text in a Linux system. It is very commonly used as a component of a series of piped commands. That's why it is called as Stream editor rather than interactive editor. sed can delete blank lines for example or can insert text in a file at specified positon.


Appending text to a file using sed

Appending adds text to the end of something but in case of sed, it process text files line by line and will append text after each line depending on several options.

sed "a newtext" grep_demo.txt - This will append the text after each line.










sed "i newtext" grep_demo.txt - This will insert the text before every line.










sed "3i newtext" grep_demo.txt - This will insert the text before the third line.








sed "3a newtext" grep_demo.txt - This will insert the text after the third line.








sed "1d" grep_demo.txt - This will delete the first line.







sed "/line/d" grep_demo.txt - This will delete the line that matches the word line








sed "/line/a ^^^" grep_demo.txt - This will append the text ^^^ after the word line









sed "/^[A-Z]/a CAPS" grep_demo.txt - After any line that starts with capital letter from A to Z. it will insert word CAPS.








In addition to add lines, we can replace lines using sed.
sed "s/lines/rows" grep_demo.txt - This will replace the text lines to rows.

 


sed "s/[a-m]/_/" grep_demo.txt - We can also use regular expressions inside sed to replace a text. This command replaces characters ranging from a to m with underscore. But it only replace the first occurence in each line.







sed "s/[a-m]/_/g" grep_demo.txt - This replace all the occurences of characters ranging from a to m in all the lines using the option g (which means greedy).







sed "s/\([aeiou]\)/[\1]/g" grep_demo.txt - This replace the characters aeiou with in the sqaure brackets.








I just covered basics in this session about grep, usage of Regex in grep and also about sed. You can browse through more options of sed using the man command.

Hope this session is helpful. Please post your comments if you have any.





Comments

Popular posts from this blog

Bash - Execute Pl/Sql script from Shell script

How to get client Ip Address using Java HttpServletRequest

How to install Portable JDK in Windows without Admin rights