Thursday, July 22, 2010

Word count of a list of files; XML wordcounts

I'm really way too proud of the following command line:

cat  file1 file2 | wc -w

which simply reads out two files and pipes the results to the wc command, producing a wordcount of the combination of the two files. Nothing too amazing here, but I didn't know this would work until one day I needed a wordcount of two files and I tried it. I didn't know it would be so simple...I didn't realize you could give cat a list of files and it would spray them to the output stream for wc to pick up.

See, I'm working on a story which I'm trying to keep under 10k words. But every day I'm writing 500 new words. Clearly these goals are in conflict.

So one thing I do is I don't delete anything. I establish a new section called To Be Deleted in the document, and when I revise a paragraph, I throw the old version there. When the day's writing is done, I delete that whole section.

But then I kept generating lots of notes and ideas as I worked on ways to add new material to the document. I wanted to include these notes in my daily wordcount, but I also wanted to know how many words of actual prose I add, so I could keep an eye on whether I was crossing that 10k line. So I put the notes in a separate file.

I've been working on this story in XML, and my XML editor doesn't have a wordcount feature. But that's one of the good things about XML...since it's text, you can go to the command line for alternative solutions. For wordcount, I've been using the Unix program wc through the Cygwin pseudo-Linux command line program.

And since I've added the cygwin\bin folder to my path, I can run cygwin commands in any DOS console. I just keep a console window handy, and I can get a wordcount on the main file or the combination of the files anytime I want.

No comments:

Post a Comment