Saturday, June 30, 2012

Index making, is there (free) software to help with this?

I'm talking about the kind of index you find in the back of a book.

It seems to me that, at its heart, such indexes are composed of data points that carry three bits of information: time, topic, whether coverage of the the topic was beginning or ending.

For example, this entry from The Official Star Trek Cooking Manual, under pie:
custard-fruit, 25-26, 91-93
contains four datapoints:
(25, custard fruit pie, start)
(26, custard fruit pie, stop)
(91, custard fruit pie, start)
(93, custard fruit pie, stop)

Now it is somewhat more complicated than that because we also have to account for the fact the sorting mechanism in place requires an awareness that this should be listed both under pie and under custard.  Plus, I have no idea why it's hyphenated in the index but nowhere else.  (And I don't know what it's like, having never had any.)

But at it's heart an index seems to about those little triads of information.

So one could get all of the information for an index by going through something and recording such triads as they come up.  But it wouldn't actually be an index, it would just be the raw materials that could be used to make an index.

What I'm wondering is if I there's anything that would do the work of making such triads into an index for me.  Obviously it's nothing that can't be done by hand, but sorting the information while it is being gathered would seem to slow things down, sorting it afterward would seem to be a massive headache.


If there is something to help with that, first off that in itself would be quite nice, but second I wonder if it could be used to do more than that.

For example I'm not particularly interested in indexing cookbooks.  They tend to come preindexed.  Works of fiction, on the other hand, seems a far more interesting topic to index.  And in that case sometimes it's not enough to know when a character is on the page or the screen.

One might want to know all of the scenes where Character X and Character Y appear together.  Or, maybe, to look at the scenes in which Character X appears but Character Y is absent.  Pretty sure the triads of information previously discussed are still all of the raw info needed to create an index of X ∧ Y, or one of X ∧ ¬Y, or X ∨ Y or even some more complicated logical expression.  Not sure if there's anything that's actually intended to do such a thing though.

Also note that appearance does not equal scene appeared in, and certainly doesn't equal chapter/episode appeared in.  It would be nice to be able to expand to that as well so one could, if it seemed useful, switch from an index showing when a character is covered to an index showing the scenes in which a character is covered which might tend to start earlier and end later.  (Maybe the character shows up four pages into a scene, for example.)  Or chapters in which the character is covered.

I don't think that requires much more on the information side.  You just need to know which scene/chapter/whatever is when, which requires the same kinds of triads of information (1, Chapter1, start)

But it would require more on the implementation side and given that I don't even know if there's something that does the most basic things I want, I have no idea if there's something that does that.


So, is there anything out there that does any of what I'm talking about?  If there is such stuff, is any of it free?

If there isn't such stuff (free or otherwise) how are indexes made?  I hope it's not entirely by hand.


  1. Do you use LaTeX or LyX at all? (The latter is more or less a GUI for the former.) They solve all sorts of typographical and layout problems, most certainly including indexing, primarily by enforcing use of named styles. They're mostly in the Unix world but they've been ported to other OSes.

  2. Here's a link with some information about indexing software:

    It looks like the answer is - none of these programs are free, and most of the work *still* has to be done by hand.

  3. Writing an index for a book is a creative act, like writing the rest of the book. I believe it is best done by an author who is sick of the book, but not yet sick to death of the book. The index is like a second table of contents, this one written from the inside out. I started by running TextSTAT, which breaks down the book by word frequency and such. It allowed me to re-see my words after I was sick of seeing them in the usual order.

    Failing that, a well-trained professional indexer can bring out insights that the author is too close to the material to see.

    A computer-generated index would be useless to anyone, much like a computer-generated blurb for the back cover.