Nerd-Author Fun v2: Text Analysis

I promised this week that my blog post would be about some of my C# coding, which also happens to dovetail in beautifully with my writing. I’ve taken my earlier work and begun the super-charging process. That being said: this is just the beginning. In the future I plan to make it available, far more powerful and with a few of the bugs ironed out.

The general premise behind the program is that it can load your story from a text file, and then allow you to analyse it. At the moment it is sans-UI – which means it doesn’t have pretty user windows, checkboxes and other controls. I’m calling it Text Analysis Command Line (TAC). As it’s a command line program you have to type commands in to operate it.

So what can it do?

Like any good program it contains help – typing ‘?’ will give you a list and basic description of the available commands; typing ‘<command> ?’ will give you detailed options on that particular command.

wordcount-1

When you see a pipe symbol ‘|’ it means or. Square brackets (‘[‘, ‘]’) mean optional.

Most commands can either display output on the screen or save the results to a file. If using a single greater-than symbol (‘>’) the file will be saved (unless it already exists). Using the double option ‘>>’ will save the file, overwriting it if necessary.

Below is a description of all of the currently available commands. The results are based on processing Vengeance Will Come, my scifi/fantasy adventure (available now):

wordcount. You can display the frequency of every word used. Earlier in the year I bought Scrivener (left). For the most part it’s a great program but I was disappointed there was no way to export (or even easily query) word count data. The image of TAC (right) shows a snippet of both the textual version (default) and the ‘basic mode’ (using -b option). The basic mode is valuable if opening the file in Excel to do pretty graphs.

wordcount can also provide wordcount-word lengththe number of words which begin with given letters (-f) or the length of words (-l).

Just in case you’re curious the longest word at 20 characters is ‘uncharacteristically’. The three 16’s are: ‘conspiratorially’, ‘incontrovertible’ and ‘responsibilities’.

For the purpose of completeness, I’ll briefly mention the data command. At the moment it’s limited, a means to interrogate the data. In order to do all of this (and future) processing I painstakingly categorise every character of text into a type. Using the -expseg option outputs this information.

datacmd-2
-expseg option

 

At this stage the only two other data commands are -sen (output sentence). For example outputting the sentence at segment 128 is:

At first light they invade my mind, besieging it to the point of exhaustion.

And -block (output block) at 128:

“I wish that I didn’t know the future; that I couldn’t see the prophecies unfold before me. At first light they invade my mind, besieging it to the point of exhaustion. Even in my fitful sleep they haunt me as wild animals stalk the scent of blood, turning what little rest I get into an extension of my waking nightmare. I cannot escape.

The find command is powerful and will be leveraged heavily in future updates.

find-1

Unsurprisingly, find locates the occurrences of a specified word. Importantly the before and after options allow displaying the word in a variable level of context (e.g. want to see 10 words preceding the word, or only 5?).

find can also locate every instance of a specified type of punctuation. Want to know how often I use exclamation marks? Typing ‘find -p’ brings up a list of punctuation options from which a selection can be made.

find-2

The answer is of course 45 (as displayed on the screenshot). However, now I know exactly where they are (and in what context).

find-3.PNG
The first use of ! occurs at the 660th character in my novel.

I’m a big believer in not over-using the exclamation mark, so a tool like this would let me easily see how often I’ve used it in a given book (and calculate the amount of text between each usage). More importantly, it can also let me track down when I’ve used a ” instead of a “ which seems to happen no matter how careful I am.

This brings me to the end of the tour of TAC v0.0.1, I hope you liked it.

Advertisements

Well, I didn’t see that coming…

Just a week ago I wrote that I didn’t want to spend any longer on my novel Vengeance Will Come. As I discussed, it had been sitting idle for months.

And then I began to read it…

…and I fell in love with it all over again (if I can use the term loosely).

But the months of “resting time” (as they say in cookbooks) has made me aware of some of it’s flaws…

So now I’m going to start do (another) final revision of it. And this one – I promise – will be the last revision that I will initiate. (You may have noticed that I left enough room in that statement for a parade to pass through…). A final revision and then I plan on releasing *somehow* as an e-book.

I may be late to the party but I have started to use Scrivener, and although it isn’t entirely intuitive to me, I am starting to like it. I am very appreciative of the generous try-before-you-buy program of 30-days of actual use. Sure, it doesn’t have everything I would want but it’s a pretty good product. I’m 99% sure I’ll be a customer before the end of my trial period. I’m also keen to try out their mind-mapping product Scrapple.