[sponsored links]
Multi-file search and replace
By John Fitzgibbon
Last updated: Thursday, March 30, 2006, 04:11 PM PST
|  |
In a conversation with a friend who manages websites, I mentioned that I was
using perl scripts to do search-and-replace across multiple text files in a
directory tree. This led to an email exchange explaining how to get this
working. I thought others might find the information useful, but I didn't
particularly feel like writing a formal "how-to", so I decided to reproduce
our emails pretty much "as-is".
Note that though my friend was using a Windows machine, one of the beauties of
using perl is that this, (completely free), solution will work just as well on
Linux, BSD, Unix or OS X boxes.
Anyway, here goes...
-------------------------------------------------------------------------------
Hey Fitz
I was looking on your site for that search-and-replace script you mentioned
to me... I have a few websites that I'll need to make site-wide changes to,
and being able to use something like that would be a big help.
I couldn't find it up on your site though, did I miss it?
Cliff
-------------------------------------------------------------------------------
Cliff,
You may live to regret asking me this question, as you're about to enter the
murky world of Unix, (actually gnu more precisely), and perl, which people tend
to love or loathe. I haven't put the stuff up on my site because it's not quite
fit for general consumption. That said, I've attached some scripts that should
get you up and running.
Here's your mission, should you choose to accept it, (hopefully this isn't as
daunting as it first looks):
1) Go to: http://www.cygwin.com
2) Follow the "Install Now" link. Download and run the setup program.
3) The default setup instructions should work fine, (Pick any of the ftp servers
when it asks where to download the packages from. The download will take a
while - this is essentially a whole new O/S you're installing on top of Windows).
4) Unzip the attached script files and save them in the "bin" directory of the
Cygwin installation, (e.g. C:\cygwin\bin, if you installed in C:\cygwin).
5) The cygwin installation should put a cygwin icon on the desktop.
Double-clicking it should bring you to a funny looking command prompt. Welcome
to the Unix command-line!
6) Type runperl.sh and hit enter. This will give you a description of the
parameters you need to enter to use my runperl.sh script.
Basically you need to tell it the folder that holds the original files you want
to process, the new folder to hold modified files, the type of file to process,
(e.g. "*.html" for all html files), and the search/replace script you want to
run. You should make sure the new folder exists before you run the script. Any
sub-folders of the main folder are searched for files to process, and the folder
structure is recreated automatically within the new folder.
If the search/replace script to run is saved in cygwin's "bin" folder, you do
not need to specify the folder name, (just the script name). If you create a
folder to hold your own scripts, you will need to specify the full folder/script
name when running runperl.sh. For example, if you saved "myscript.pl" in a
"scripts" folder, you might type something like:
runperl.sh "c:/sites/x" "c:/new/x" "*.html" "c:/scripts/myscript.pl"
As with most Unix commands, runperl.sh doesn't give any feedback. It just does
the job, then comes back with a new command prompt.
Now all you need to do is start writing search/replace scripts.
"samplescript.pl" is a fully working, (very simple), example script. I've added
comments to it to try to explain what's going on, and how it can be extended.
All the scripts are plain text files, so feel free to poke around in them. My
convention, (and a fairly popular one), is to name Shell Scripts with a ".sh"
extension and Perl Scripts with a ".pl" extension. Shell Scripts are basically
lists of Operating System, (cygwin), commands to execute. Perl is it's own
self-contained scripting language, with a heavy emphasis on pattern matching and
search/replace functionality.
Some tips on using cygwin's command line:
- type exit to quit.
- you can right-click to paste text
- you can highlight text and press enter to copy
- If you've used DOS batch files, note that cygwin uses a "/" to separate
folders not "\".
- click on the cygwin icon (top left corner of window), and select Properties,
then Layout, to change the Window size to something bigger than 80 x 24
characters.
- You'll find a user guide online if you really want to get into cygwin.
- You can create your own shell scripts in cygwin's bin directory to save typing
stuff over and over. A simple script to execute a single command might look like
this:
#!/bin/sh
runperl.sh "c:/sites/x" "c:/new/x" "*.html" "c:/scripts/myscript.pl"
The "#!/bin/sh" tells cygwin this is a shell script, and the rest is just
interpreted as commands to run.
Simple search/replace is pretty easy to do. With some of the advanced features
you can do some pretty amazing transformations, (but you can also make some
pretty royal cock-ups). I'm no expert on perl myself, but feel free to run
questions by me.
Hope all this doesn't hurt your head too much,
Fitz.
-------------------------------------------------------------------------------
Hey Fitz --
Thanks so much for writing back and your extremely detailed info. It does
seem like a somewhat large job, with the Unix installation and everything, but
I'm going to look into it. I presume doing that Unix installation won't
affect my regular OS at all, right?
Cliff
-------------------------------------------------------------------------------
> doing that Unix installation won't affect my regular OS at all, right?
I've run cygwin on 98, ME and 2000 without problems. To Windows, it's just
another application.
Fitz.
-------------------------------------------------------------------------------
Hey Fitz --
OK, I'm working on it and so far so good. Download, install, set up,
etc. Now I tried my first test but this is what came back:
/usr/bin/runperl.sh: check_create_folder.pl: not found
/usr/bin/runperl.sh: run_perl_script.pl: not found
Those 2 files are indeed in the bin directory as you specified, so I'm not
sure why I'm getting this error. Did I miss a step or something?
Thanks for your help!!
Cliff
-------------------------------------------------------------------------------
Cliff,
Looks like perl is not installed, (you can tell by checking for perl.exe in the
bin directory). Reading the cygwin FAQs it seems the setup was changed recently
to only install the minimum number of components, so I'm guessing perl is no
longer installed by default.
If it is missing, to install perl:
- run the cygwin setup program again.
- after selecting an ftp site, you will get to the "Select Packages" screen.
- expand the "Interpreters" list.
- click on the "Skip" text beside perl.
- the "Skip" should change to the version number to install.
- continue the install
This should install the package.
You might find the setup program itself tells you there is a newer version of
itself available - you may want to download the setup program again if it does,
though you can try continuing with the old setup.
Let me know if this solves the problem.
Fitz.
-------------------------------------------------------------------------------
IT WORKS!! Whoo-hoo!! Does a great job too - very clever and will be a
huge help. THANK YOU!!
Cliff
-------------------------------------------------------------------------------
Hey Fitz,
I've used the search script a few times now and it works perfectly. But
now I'm trying it and it's not working at all. What I need to do is go
through a rather large site and replace a Flash image that is found on
every page with a new animated gif. The code for the Flash is kind of
long, and I'm wondering if that's why it's not working? Here's the
original code I want to replace:
Big Long Ugly Text Here
I went through the above and added an escape backslash before each
quotation mark, forward-slash, period, etc., even ones I wasn't sure about,
like you said in the script. The script processes as it should, there are
no errors, and the files are created in the replacement destination
directory. But when you open those pages, the replacement didn't happen -
the original text is still there untouched.
Can you do a search and replace for something so long? Or does it have to
be short? Is there some special way to designate that it continues onto
another line? Any thoughts or other reasons why you think this wouldn't work?
FYI I'm attaching the script I used.
Thanks again!!!!!
Cliff
-------------------------------------------------------------------------------
Cliff,
The text to replace spans 2 lines. The perl script works line-by-line through
each file, so it never makes the connection between the text on one line and the
next.
Assuming the line break is always in the same place, you can just put 2
substitute commands in the script, one beginning with:
s/Text on Line1...
and the other beginning:
s/Text on Line2...
If you want to consolidate everything onto one line, just put all the new text
in one or other of the substitute commands:
s/Text on Line 1//gi;
s/Text on Line 2/NEW STUFF HERE/gi;
Fitz
-------------------------------------------------------------------------------
|
|