Recursively Search and Replace Terms in Multiple Files with grep, xargs and sed

Posted by on Jan 13, 2009 in Environment, Linux, Ubuntu11 comments

I recently offered to update some simple information on a website for a friend – normally an easy enough task, but unfortunately, even though the original developer of the site generated it with PHP, they didn’t utilise a database, or even combine common data/information/text (such as the header and footer of a HTML template) into manageable files. As a result, instead of simply changing text in a single file, which could be adopted site-wide, I was faced with (potentially) changing over 100 individual PHP files in a text editor! Not a pleasant task – if done manually…

Thankfully, Linux has dozens of command line utilities which can aid in just such a horribly monotonous task. In this case, I used grep, xargs and sed to undertake the task. I’ve written briefly before about grep, and how it can be used to recursively search files – extending the same concept, and throwing xargs and sed into the equation, it’s possible to recursively search files, and replace terms within those files. The command structure is as follows:

grep -lr -e '<searchterm>' * | xargs sed -i 's/<searchterm>/<targetterm>/g'

What is basically happening, is the command is telling Linux to find all the files containing <searchterm> and replace all occurrences with <targetterm>. grep is called first to find any files containing <searchterm>. When <searchterm> is matched, the information output by grep (i.e. the filename) is passed to xargs, which then executes sed. sed looks for the <searchterm> within the file passed via grep and xargs and replaces all instances of <searchterm> with <targetterm>.

A practical use can be changing all instances of “apples” to “oranges” in every html file contained in public_html and its sub-directories:

cd /home/username/public_html
grep -lr -e 'apples' *.html | xargs sed -i 's/apples/oranges/g'

To break it down further:

  1. grep is asked to list anything containing “apples”. The -r option tells grep to do this recursively, through all sub-directories, and the -l option tells grep to only list the file names containing “apples” (the default behaviour of grep is to ouput the filename, and every line within the file containing “apples”). The -e option tells grep that the search term (“apples”) may contain a regular expression, and to disregard the leading hyphen in the search term, so as not to interpret the search term as an option (all options passed to Linux commands begin with a hyphen). Finally, *.html tells grep to search all HTML files. This of course could be changed to *.php, or *.txt for example.
  2. Once grep finds an instance of “apples” within a file, the filename is piped | through to xargs, which allows additional commands to be executed without interrupting the original command, and, once the additional command has finished, to continue running the original command. In this case, sed can be executed without interrupting grep. xargs basically allows extending commands, feeding the output of one to another. For example, grep finds one or more instances of “apples” in fruit.html, and simply outputs the file name fruit.html (not the contents of the file itself, or even the lines containing “apples”). “fruit.html” is then piped through to xargs.
  3. xargs takes the output fed to it and effectively uses it as input to sed – in this case, the filename, “fruit.html”. sed is a command for streaming, filtering and altering text. sed is asked to look at fruit.html (sent to it from grep, through the pipe and via xargs), find any occurrences of “apples” and replace them with “oranges”. The -i option tells sed to do this in-place – i.e. to work with fruit.html to it, not to create a new file etc. Everything within the single quotes is the expression sed works with. First, s tells sed to substitute “apples” for “oranges”. The / is the delimiter, which effectively separates the search term and target term, and also separates those in turn from the options passed to sed (s and g). Finally, the g switch tells sed to make these changes global within the file – change all instances of <searchterm> as opposed to only the first instance (which is the default behaviour of sed).
  4. Once sed has finished editing fruit.html, the command loops back to grep, and the process continues until all instances in all files are changed.

What makes this combination of commands so powerful, is it’s not only possible to change words, or phrases, but complex regular expressions can be employed to search and replace intricate patterns within multiple files.

With this incredibly versatile set of commands in my toolbox, it took a matter of seconds to complete my task, instead of potentially dozens of minutes.

Finally, this isn’t the only and exclusive method for achieving the same result – there are numerous commands which may be strung together to the same effect, but as a frequent user of grep, xargs and sed, this seemed the sensible method :)

Linux: turning monotonous tasks into tasks. (OK, not a great slogan, but I was never really interested in marketing…)

Tags: , , , , ,