Nerd-Author Fun v2: Text Analysis

I promised this week that my blog post would be about some of my C# coding, which also happens to dovetail in beautifully with my writing. I’ve taken my earlier work and begun the super-charging process. That being said: this is just the beginning. In the future I plan to make it available, far more powerful and with a few of the bugs ironed out.

The general premise behind the program is that it can load your story from a text file, and then allow you to analyse it. At the moment it is sans-UI – which means it doesn’t have pretty user windows, checkboxes and other controls. I’m calling it Text Analysis Command Line (TAC). As it’s a command line program you have to type commands in to operate it.

So what can it do?

Like any good program it contains help – typing ‘?’ will give you a list and basic description of the available commands; typing ‘<command> ?’ will give you detailed options on that particular command.

wordcount-1

When you see a pipe symbol ‘|’ it means or. Square brackets (‘[‘, ‘]’) mean optional.

Most commands can either display output on the screen or save the results to a file. If using a single greater-than symbol (‘>’) the file will be saved (unless it already exists). Using the double option ‘>>’ will save the file, overwriting it if necessary.

Below is a description of all of the currently available commands. The results are based on processing Vengeance Will Come, my scifi/fantasy adventure (available now):

wordcount. You can display the frequency of every word used. Earlier in the year I bought Scrivener (left). For the most part it’s a great program but I was disappointed there was no way to export (or even easily query) word count data. The image of TAC (right) shows a snippet of both the textual version (default) and the ‘basic mode’ (using -b option). The basic mode is valuable if opening the file in Excel to do pretty graphs.

wordcount can also provide wordcount-word lengththe number of words which begin with given letters (-f) or the length of words (-l).

Just in case you’re curious the longest word at 20 characters is ‘uncharacteristically’. The three 16’s are: ‘conspiratorially’, ‘incontrovertible’ and ‘responsibilities’.

For the purpose of completeness, I’ll briefly mention the data command. At the moment it’s limited, a means to interrogate the data. In order to do all of this (and future) processing I painstakingly categorise every character of text into a type. Using the -expseg option outputs this information.

datacmd-2
-expseg option

 

At this stage the only two other data commands are -sen (output sentence). For example outputting the sentence at segment 128 is:

At first light they invade my mind, besieging it to the point of exhaustion.

And -block (output block) at 128:

“I wish that I didn’t know the future; that I couldn’t see the prophecies unfold before me. At first light they invade my mind, besieging it to the point of exhaustion. Even in my fitful sleep they haunt me as wild animals stalk the scent of blood, turning what little rest I get into an extension of my waking nightmare. I cannot escape.

The find command is powerful and will be leveraged heavily in future updates.

find-1

Unsurprisingly, find locates the occurrences of a specified word. Importantly the before and after options allow displaying the word in a variable level of context (e.g. want to see 10 words preceding the word, or only 5?).

find can also locate every instance of a specified type of punctuation. Want to know how often I use exclamation marks? Typing ‘find -p’ brings up a list of punctuation options from which a selection can be made.

find-2

The answer is of course 45 (as displayed on the screenshot). However, now I know exactly where they are (and in what context).

find-3.PNG
The first use of ! occurs at the 660th character in my novel.

I’m a big believer in not over-using the exclamation mark, so a tool like this would let me easily see how often I’ve used it in a given book (and calculate the amount of text between each usage). More importantly, it can also let me track down when I’ve used a ” instead of a “ which seems to happen no matter how careful I am.

This brings me to the end of the tour of TAC v0.0.1, I hope you liked it.

Advertisements

Writing and Coding Update

Coding : Character Point-Of-View Chart

I want to learn how to program in C# to add that arrow to my professional quiver. You never know when you need another arrow.

In light of that goal and also to aid in my writing I’m going to build a small application (“Perspective“) to generate my Character Point-of-View (POV) charts.

The charts display by chapter and scene which character has the point of view. I first described them in Examining Character Balance and shared the Excel file which I use. However the spreadsheet does so much it is complex and I could understand people being scared off by it. And, it’s a great excuse to do some C# and get side-benefits from it.

Draft Cal POV

It is important to note this will be an iterative development. The first version won’t look anything like the final product. I’m not quite ready to share my code, but I will – and the application – in the future.

v0.1 screenshot

I’m using Windows Forms. (I think this is a slightly older technology, but I thought it was a good place to start). The form doesn’t do much, and data entry is simplistic: character names will be separated by commas, and a chapter will be ended by a semi-colon.

I’ll be putting formatting options on the form so you can control what it looks like. Here are the terms I’m using at the moment.

Style design

I’ve also got a few experimental ideas with which I’m keen to include. I think they could really add value to the chart.

Writing

One of my goals in this revision was to reduce the amount of head-hopping. So how am I going so far? I’m glad you asked, because here are some outputs from my Perspective application that demonstrates the progress so far.

Original manuscript. Without the benefit (yet) of labels, I’ll explain it. Below shows the first 4 chapters.

  • Chapter 1 = 6 scenes
  • Chapter 2 = 5 scenes
  • Chapter 3 = 8 scenes
  • Chapter 4 = 8 scenes

VWC - Old Version

Revised manuscript. It’s a bit hard to see the difference because the image comes out a different size…. *scratches head*

  • Chapter 1 = 3 scenes
  • Chapter 2 = 4 scenes
  • Chapter 3 = 5 scenes
  • Chapter 4 = 6 scenes

VWC - New Version

With less scenes there is less head-jumping, which should result in less fragmentation for the reader. I’ve also expanded the word count (in those four chapters) by 2,000 words.

Laying out a Story Seed

The title of this post is a play on words. First I’m going to talk about my programming, and why I’m so keen for layout management, and then share the idea of a story seed, just to whet your appetite or get your writing juices flowing.

Programming: Why do I care so much about layout?

Each time I start my computer for a writing session I follow the same steps:

  1. Open Word on right hand monitor, align to left (50% width).
  2. Open Excel on left monitor, full size.
  3. Open OneNote on left monitor, full size.

When I’m programming I do things a little differently:

  1. Open Eclipse on left monitor, full size.
  2. Open Windows Explorer and navigate to folder structure, left align.
  3. Open SQLiteStudio on right monitor, full size.
  4. Open Firefox, right monitor, right aligned. Load Trac.

At least now Windows 10 remembers on which monitor the application was last on, but that is far from customised in how I prefer to work. For my productivity to be maximized I’d ideally want to tell Windows what I’m going to be working on as I log in. It should know what to load and where to place it.

You can’t do this with Windows yet, but at least in my own application it allows that level of control.

Even while working on writing (generically), depending on which project I’m working on will determine what layout I’ll want. If I’m plotting one, and editing another, chances are a different view will be more beneficial.

My intention is that when you save a project it will save the current layout (project-specific). These layouts are really for quick-use templates.

This slideshow requires JavaScript.

The layout functionality is done now (except a few edges I’ll smooth later). Using a layout you can:

  • position and size the application window
  • position, size and name all windows on the screen
  • saves the panels and their names on each of the windows

Writing Seed: Lifetime Magic

I’ve been toying with a fraction of an idea for a while.

Normally magic systems revolve around a select few, who by ancestry or knowledge can wield powers. Often they incur a cost for doing so, and need to recharge their abilities or rest between efforts.

What if the following were true:

  1. The majority of the population has an innate ability to wield magic.
  2. The limits of magic are not well understood, though evidence suggests the environment and objects can be temporarily manipulated. (Objects or persons cannot be imbued with lasting magical effect).
  3. The quantity of magic a person has, is born into them. There is no known way to measure, extend or replenish the spent magic. Once gone, it is believed to be gone for good.

Using these three foundations, what could happen in such a society?

  • the inability to measure magical capacity would mean it isn’t a significant part of a power structure. However those who are known to have used all their magic would be an underclass. The lowest on the social strata would be those few born without magic.
  • people would likely horde their magic, wanting to save it for life-and-death situations and often for selfish purposes.
  • the poor would be forced to use their magic (to survive), thus pushing them further down the social ladder.
  • people would try to bluff or conceal running out of magic.
  • with the cost of experimentation being so high, understanding of magic would be limited. Unscrupulous researchers might go to devious schemes to trick, manipulate or even harm others in an attempt to gain more magic.
  • there would be fads and self-help gurus who posited various means of increasing one’s capacity.
  • magic would run out unexpectedly, causing potential mayhem or embarrassment.

At first I had no story to go along with this, but in the last few days one has begun to unfold in my mind. I may do a short story to explore this idea further in the future.

The Danger of Boring Bits

I must begin this by saying every reader is different. What I find fascinating you might consider yawn-worthy, and visa versa. Grammar and punctuation are largely objective, the quality of a story is subjective: beauty (or ugliness) is in the eye of the beholder.

Last weekend I was reading a novel which I felt sure I’d be blogging about by name, encouraging you all to run out and buy. It’s been a while since I’ve read anything I found so engaging.

I was staying up late to read and reaching for the book within minutes of my eyelids opening. All other pursuits and activities were put on hold as I read eager to discover what happened next. After investing half the weekend reading I’d made significant progress.

person-731165_960_720And then the character moved to a different situation, and my interest began to wane. I slogged through increasing boredom, knowing the situation would have to change soon. Surely? Multiple chapters later I was still stuck in the same place. I started to skip pages, then whole chapters and still I was stuck in the swamp of boredom.

As I closed the book for the last time on Sunday evening I know the swamp is coming to the end. The character is about to change setting, drawing this section to a close.

The only problem is I’m not sure I care any more. Even though the story before this point was great, I’ve lost interest. The book will probably return to its former glory, but what if it doesn’t? As I feel now I may never finish the book.

Perhaps the fault is my own. Maybe in those skipped pages and chapters I’ve missed some crucial element, that would have made the boredom worthwhile. But I doubt it.

I feel as though I was knocked out of the story. Boring bits cost the goodwill of the reader, and if the cost is too high the book goes down. Chances are, I’ll be more hesitant to pick up a book by the same author again. The realisation of just how detrimental boring bits are, has caused me to be even more wary of writing them in the future.


For the next month I don’t plan to do much writing, if any, with the exception of blogging. I have some programming that I need to do. I’m involved in running a men’s group at my church, and to help it to run smoother I need to develop some software.

I suspect it will be a considerable amount of work; hopefully I can get it done within a month. Then I’ll be back to writing (which I’m already looking forward too).

What’s been happening?

I took an unplanned sabbatical from blogging.

So what have I been up to?

My first novel Vengeance Will Come has been completed (and submitted). You might notice I’ve gone from 29 chapters down to 22. I suspect it probably needs expansion around the 4/5ths point, but I’ll leave that until I can get an expert opinion.

vwc

I’ve also been doing some work on my writing application. The development wasn’t writing related, but rather the application infrastructure. I’ve added:

  • a debug output panel (to help me in development)
  • ability to close a panel (will prompt for save)
  • ability to name, save and load a layout (size and position of the application, and all windows within the application).

layout application.PNG

And I’ve done some chores and procrastination: gaming and TV shows.

Nerd-Author Fun

I’ve spent a few days goofing off from writing. Well, kind off…it was writing-related.

I wrote a Java program that can load and process my novel. Now having done that load work will enable me to add useful tools in the future, but for now I just did some basic word frequency analysis. Sounds like some nerd fun? And it was.

First, technical stuff and then some results:

Technical stuff

Loading it into the program turned out to be more difficult than I expected. Part of the difficulty was how I defined things on the page. When I was younger I’d have told you that anywhere there is a gap between blocks of text then it is a paragraph. In my mind, at least, the concept of a paragraph is stretched out-of-shape by the frequent carriage returns of dialogue.

Paragraph
Is this a paragraph? Two? Three? I’m so confused…

I’m sure there’s probably a technical term (which I’m happy to be told)., but I didn’t want to research it. So, I solved the problem like any fiction author: I just made words up.

Hence forth, for all time until I find a better name, they shall be known as minor blocks (green) and major blocks (blue). The term paragraph may now be discontinued.

blocks

(I suspect I’m already in the process of changing my mind…)

Results

Before you peruse the results, you might wonder what possible good a function like this might be? (Admittedly at the moment there is too much information). The tool could be used in the following ways:

  1. There are some words, which are so peculiar or powerful that they should only be used once in a story. This tool will help locate those words. For example: gruesome (0), or horror (4). Wow, there’s a lot of cry (10) / crying (5) going on. I really need to check that… Point proven.
  2. There are also some words that mean-nothing and should be replaced with more descriptive terms, like interesting (3).
  3. It could help expose word-use problems. For example, when my characters want to swear they say “frak”. If I find a “frack” or a “fak” then I know I’ve made a mistake.
  4. Nerdy pleasure (hey, it’s valid for me)

When considering these results please note the following caveats:

  • Not all bugs have been ironed out; give me a 5% margin for error.
  • Contractions are included (so “don’t” and “do not” is counted as 2 words)
  • There are no exclusions yet (“a”, “is” etc are included)

For a novel slightly over 86K words, I was surprised with the results.

  • 8,443 unique words
  • The top 10 most frequent words account for 18,624 words. (the, to, and, a of, he, you, was, his, I).
  • Most frequent words per first letter: Unsurprisingly mostly character names. (A = and; B = be; C = could; D = Danyel; E = even; F = for; G = get; H = he; I = I; J = Jessica; K = Keeshar; L = like; M = Menas; N = not; O = of; P = people; Q = Queen; R = Regent; S = said; T = the; U = up; V = very; W = was; X = Xu; Y = you; Z = Zekkari).
  • Everything above 15 characters long was a processing error 🙂Words starting with letter

Length of words

 

Nobody talked me down…

I’ve had a mini break from writing (dangerous, I know). But it’s been a time of enjoyment and productivity (albeit in other areas), so I don’t regret it.

Firstly since no one talked me down, I’ve been doing some coding in Java. It’s not a writing program yet but the framework to support it (at about 75% completion, to pull a number from the sky). And while I’m making up numbers let’s also say its a thousand percent under budget. (Speaking of budgets – the Australian budget is out tonight and here’s an excellent article on the immorality of spending the next generation(s) money). But I digress…

For my framework I’ve gone with what’s called an internal frame application because it allows maximum flexibility to the user. You can stretch the application over multiple monitors and position and size any number of internal windows to your preferences.

Writing Framework1

Each window can then have any number of panels added to it. (For example a writing panel, a character attributes panel, a todo panel…)

On other matters I’ve also been enjoying more time in the kitchen, having fun preparing a few more meals. (This gives both me enjoyment and my beautiful wife a break: wins-all round).

But now that I have some feedback from my beta readers it’s time to get back to writing and Vengeance Will Come. My next few posts I plan on writing about how I work through those beta reader comments.