Wednesday, January 19, 2011

Using Emacs to fix a column of text in Excel

I had occasion to rejoice at Emac's regular expression features yesterday. I was converting a Word table to an Excel sheet, and I had a column with numbered paragraphs. The numbering arrived in Excel as hardcoded text, and I wanted to get rid of it, since I didn't need the numbers in Exel.

The text in the cells was regular, but not amenable to a simple search, because it looked like this
1. Alpha
2. Bravo

The fix, in Emacs, was really quick.*

1. I copied the column and pasted it into Emacs, where it became a series of lines of plain text.
2. I copied the number-and spaces prefix on one line to get a string with the correct number of spaces.
3. I did a replace-regexp command, replacing ".. " (2 periods, 3 spaces) with nothing. Since the period is a single-character wildcard in a regular expression, that meant 'find strings consisting of any two characters followed by three spaces'.  This converted my lines to the following:


4. I copied the modified lines and pasted them over the column. All better.

Actually, I had to do a little more cleanup than this, because there were some two-digit numbers, but you get the idea.

There are several wins here. It's smart of Excel to so nicely accept pasted input in the form of lines. This comes in handy for me all the time. And Emacs' ability to do replaces on regular expressions is extremely powerful.

*Now that I think of it, it would probably have been even faster to fix this in Word before pasting the table into Excel. In Word, I could have changed the numbered paragraphs to non-numbered ones.

No comments:

Post a Comment