Nerd-Author Fun v2: Text Analysis

I promised this week that my blog post would be about some of my C# coding, which also happens to dovetail in beautifully with my writing. I’ve taken my earlier work and begun the super-charging process. That being said: this is just the beginning. In the future I plan to make it available, far more powerful and with a few of the bugs ironed out.

The general premise behind the program is that it can load your story from a text file, and then allow you to analyse it. At the moment it is sans-UI – which means it doesn’t have pretty user windows, checkboxes and other controls. I’m calling it Text Analysis Command Line (TAC). As it’s a command line program you have to type commands in to operate it.

So what can it do?

Like any good program it contains help – typing ‘?’ will give you a list and basic description of the available commands; typing ‘<command> ?’ will give you detailed options on that particular command.

wordcount-1

When you see a pipe symbol ‘|’ it means or. Square brackets (‘[‘, ‘]’) mean optional.

Most commands can either display output on the screen or save the results to a file. If using a single greater-than symbol (‘>’) the file will be saved (unless it already exists). Using the double option ‘>>’ will save the file, overwriting it if necessary.

Below is a description of all of the currently available commands. The results are based on processing Vengeance Will Come, my scifi/fantasy adventure (available now):

wordcount. You can display the frequency of every word used. Earlier in the year I bought Scrivener (left). For the most part it’s a great program but I was disappointed there was no way to export (or even easily query) word count data. The image of TAC (right) shows a snippet of both the textual version (default) and the ‘basic mode’ (using -b option). The basic mode is valuable if opening the file in Excel to do pretty graphs.

wordcount can also provide wordcount-word lengththe number of words which begin with given letters (-f) or the length of words (-l).

Just in case you’re curious the longest word at 20 characters is ‘uncharacteristically’. The three 16’s are: ‘conspiratorially’, ‘incontrovertible’ and ‘responsibilities’.

For the purpose of completeness, I’ll briefly mention the data command. At the moment it’s limited, a means to interrogate the data. In order to do all of this (and future) processing I painstakingly categorise every character of text into a type. Using the -expseg option outputs this information.

datacmd-2
-expseg option

 

At this stage the only two other data commands are -sen (output sentence). For example outputting the sentence at segment 128 is:

At first light they invade my mind, besieging it to the point of exhaustion.

And -block (output block) at 128:

“I wish that I didn’t know the future; that I couldn’t see the prophecies unfold before me. At first light they invade my mind, besieging it to the point of exhaustion. Even in my fitful sleep they haunt me as wild animals stalk the scent of blood, turning what little rest I get into an extension of my waking nightmare. I cannot escape.

The find command is powerful and will be leveraged heavily in future updates.

find-1

Unsurprisingly, find locates the occurrences of a specified word. Importantly the before and after options allow displaying the word in a variable level of context (e.g. want to see 10 words preceding the word, or only 5?).

find can also locate every instance of a specified type of punctuation. Want to know how often I use exclamation marks? Typing ‘find -p’ brings up a list of punctuation options from which a selection can be made.

find-2

The answer is of course 45 (as displayed on the screenshot). However, now I know exactly where they are (and in what context).

find-3.PNG
The first use of ! occurs at the 660th character in my novel.

I’m a big believer in not over-using the exclamation mark, so a tool like this would let me easily see how often I’ve used it in a given book (and calculate the amount of text between each usage). More importantly, it can also let me track down when I’ve used a ” instead of a “ which seems to happen no matter how careful I am.

This brings me to the end of the tour of TAC v0.0.1, I hope you liked it.

Advertisements

Writing and Coding Update

Coding : Character Point-Of-View Chart

I want to learn how to program in C# to add that arrow to my professional quiver. You never know when you need another arrow.

In light of that goal and also to aid in my writing I’m going to build a small application (“Perspective“) to generate my Character Point-of-View (POV) charts.

The charts display by chapter and scene which character has the point of view. I first described them in Examining Character Balance and shared the Excel file which I use. However the spreadsheet does so much it is complex and I could understand people being scared off by it. And, it’s a great excuse to do some C# and get side-benefits from it.

Draft Cal POV

It is important to note this will be an iterative development. The first version won’t look anything like the final product. I’m not quite ready to share my code, but I will – and the application – in the future.

v0.1 screenshot

I’m using Windows Forms. (I think this is a slightly older technology, but I thought it was a good place to start). The form doesn’t do much, and data entry is simplistic: character names will be separated by commas, and a chapter will be ended by a semi-colon.

I’ll be putting formatting options on the form so you can control what it looks like. Here are the terms I’m using at the moment.

Style design

I’ve also got a few experimental ideas with which I’m keen to include. I think they could really add value to the chart.

Writing

One of my goals in this revision was to reduce the amount of head-hopping. So how am I going so far? I’m glad you asked, because here are some outputs from my Perspective application that demonstrates the progress so far.

Original manuscript. Without the benefit (yet) of labels, I’ll explain it. Below shows the first 4 chapters.

  • Chapter 1 = 6 scenes
  • Chapter 2 = 5 scenes
  • Chapter 3 = 8 scenes
  • Chapter 4 = 8 scenes

VWC - Old Version

Revised manuscript. It’s a bit hard to see the difference because the image comes out a different size…. *scratches head*

  • Chapter 1 = 3 scenes
  • Chapter 2 = 4 scenes
  • Chapter 3 = 5 scenes
  • Chapter 4 = 6 scenes

VWC - New Version

With less scenes there is less head-jumping, which should result in less fragmentation for the reader. I’ve also expanded the word count (in those four chapters) by 2,000 words.

Who wrote this?

Oh… me. 🙂

I’ve been doing a lot of chopping and changing during my editing which I think is vastly improving the structure of the story.

Part of my pathology is that I love visualisation; so I wrote a little program to help you visualise the first 12 chapters of the story which I have edited so far. The original version is on the left and the revised version is on the right. Each row represents a chapter and each box a scene (regardless of scene word count).

And because you can never have enough visualisation, here is some more – this time colouring each scene by the point of view (each character having their own colour).

You can see that I have reduced the number of point-of-view changes. Next time I procrastinate I’ll do some more 🙂