Searching Files
This chapter describes how to search directories and files for keywords and strings by using the grep command.
Searching for Patterns With grep
To search for a particular character string in a file, use the grep command. The basic syntax of the grep command is:
$ grep string file |
In this example, string is the word or phrase you want to find, and file is the file to be searched.
Note - A string is one or more characters. A single letter is a string, as is a word or a sentence. Strings can include blank spaces, punctuation, and invisible (control) characters.
For example, to find Edgar Allan Poe's telephone extension, type grep, all or part of his name, and the file containing the information:
$ grep Poe extensions Edgar Allan Poe x72836 $ |
Note that more than one line might match the pattern you give.
$ grep Allan extensions David Allan x76438 Edgar Allan Poe x72836 $ grep Al extensions Louisa May Alcott x74236 David Allan x76438 Edgar Allan Poe x72836 $ |
grep is case sensitive; that is, you must match the pattern with respect to uppercase and lowercase letters:
$ grep allan extensions $ grep Allan extensions David Allan x76438 Edgar Allan Poe x72836 $ |
Note that grep failed in the first try because none of the entries began with a lowercase a.
grep as a Filter
You can use the grep command as a filter with other commands, enabling you to filter out unnecessary information from the command output. To use grep as a filter, you must pipe the output of the command through grep. The symbol for pipe is "|".
The following example displays files that end in ".ps" and were created in the month of September.
$ ls -l *.ps | grep Sep |
The first part of this command line produces a list of files ending in .ps.
$ ls -l *.ps -rw-r--r-- 1 user2 users 833233 Jun 29 16:22 buttons.ps -rw-r--r-- 1 user2 users 39245 Sep 27 09:38 changes.ps -rw-r--r-- 1 user2 users 608368 Mar 2 2000 clock.ps -rw-r--r-- 1 user2 users 827114 Sep 13 16:49 commands.ps $ |
The second part of the command line pipes that list through grep, looking for the pattern Sep.
| grep Sep |
The search provides the following results.
$ ls -l *.ps | grep Sep -rw-r--r-- 1 user2 users 39245 Sep 27 09:38 changes.ps -rw-r--r-- 1 user2 users 827114 Sep 13 16:49 commands.ps $ |
grep With Multiword Strings
To find a pattern that is more than one word long, enclose the string with single or double quotation marks.
$ grep "Louisa May" extensions Louisa May Alcott x74236 $ |
The grep command can search for a string in groups of files. When it finds a pattern that matches in more than one file, it prints the name of the file, followed by a colon, then the line matching the pattern.
$ grep ar * actors:Humphrey Bogart alaska:Alaska is the largest state in the United States. wilde:book. Books are well written or badly written. $ |
Searching for Lines Without a Certain String
To search for all the lines of a file that do not contain a certain string, use the -v option to grep. The following example shows how to search through all the files in the current directory for lines that do not contain the letter e.
$ ls actors alaska hinterland tutors wilde $ grep -v e * actors:Mon Mar 14 10:00 PST 1936 wilde:That is all. $ |
Using Regular Expressions With grep
You can also use the grep command to search for targets that are defined as patterns by using regular expressions. Regular expressions consist of letters and numbers, in addition to characters with special meaning to grep. These special characters, called metacharacters, also have special meaning to the system. When you use regular expressions with the grep command, you need to tell your system to ignore the special meaning of these metacharacters by escaping them. When you use a grep regular expression at the command prompt, surround the regular expression with quotes. Escape metacharacters (such as & ! . * $ ? and \) with a backslash (\). See "Searching for Metacharacters" for more information on escaping metacharacters.
A caret (^) metacharacter indicates the beginning of the line. The following command finds any line in the file list that starts with the letter b.
$ grep '^b' list
A dollar-sign ($) metacharacter indicates the end of the line. The following command displays any line in which b is the last character on the line.
$ grep 'b$' list
The following command displays any line in the file list where b is the only character on the line.
$ grep '^b$' list
Within a regular expression, dot (.) finds any single character. The following command matches any three-character string with "an" as the first two characters, including "any," "and," "management," and "plan" (because spaces count, too).
$ grep 'an.' list
When an asterisk (*) follows a character, grep interprets the asterisk as "zero or more instances of that character." When the asterisk follows a regular expression, grep interprets the asterisk as "zero or more instances of characters matching the pattern."
Because it includes zero occurrences, the asterisk can create a confusing command output. If you want to find all words with the letters "qu" in them, type the following command.
$ grep 'qu*' list
However, if you want to find all words containing the letter "n," type the following command.
$ grep 'nn*' list
If you want to find all words containing the pattern "nn," type the following command.
$ grep 'nnn*' list
To match zero or more occurrences of any character in list, type the following command.
$ grep .* list
Searching for Metacharacters
To use the grep command to search for metacharacters such as & ! . * ? and \, precede the metacharacter with a backslash (\). The backslash tells grep to ignore (escape) the metacharacter.
For example, the following expression matches lines that start with a period, and is useful when searching for nroff or troff formatting requests (which begin with a period).
$ grep ^\. |
Table 4-1 lists common search pattern elements you can use with grep.
Table 4-1 grep Search Pattern Elements
Character | Matches |
---|---|
The beginning of a text line | |
The end of a text line | |
Any single character | |
[...] | Any single character in the bracketed list or range |
[^...] | Any character not in the list or range |
Zero or more occurrences of the preceding character or regular expression | |
.* | Zero or more occurrences of any single character |
The escape of special meaning of next character |
Note that you can also use these search characters in vi text editor searches.