Text Analysis with Voyant

http://docs.voyant-tools.org/start/

Voyant is an online tool (although it requires a download as well as a Java install for that cool Voyant 2.0) that allows you to see/map the frequency of words in a given document. Thankfully, you can paste .txt documents into Voyant, sparing you the need to copy the entire Declaration of Independence in one frustrating scroll. Also, you can copy multiple documents into Voyant, allowing wonderful compare/contrast exercises between the corpus–all of the works–and the individual documents themselves. The five main tools are Cirrus, Reader, Trends, Summary, and Contexts.  Cirrus is basically a fancy word for “Word Cloud”: it gives you a very visual and colorful representation of which words appear most in a given document. Reader, as its title suggests, allows you to read through a whole document–or a whole corpus, if you’re feeling really ambitious. However, it is most useful for allowing you to scroll over certain words to see how many times they appear in that particular document. Trends appeals to my math-loving side because it is a graph of how often a selected word (from the Cirrus tool) appears in the corpus. And if you want to get really fancy with Trends, you can see a graph for how often a word appears in just one document, so you can see if there are unusual spikes within one set even if there aren’t any present in the whole corpus. Summary is also self-explanatory, but it provides the numbers for everything. There’s nothing visual about summary; it’s all about words, words, words–and numbers, I suppose. If you don’t know every document’s length or vocabulary density, the Summary tool will figure it out in a pinch. However, Summary’s most useful category is Distinctive Words, which allows you to see which words appear in one document and no others–which means you don’t have to trawl through the Cirrus or the Frequency tools to see where the gigantic spikes are. Finally, the Contexts tool appeals to my writing-loving side, as it shows you all the surrounding words when it comes to the term/word selected. For instance, Cirrus or Trends couldn’t tell the difference between the nice Mrs. Burns or a fire that burns the whole town down. You could check through the Reader, but frankly, no one wants to do that. Contexts shows you, “Okay, this use of the word means the town was razed.” It stops you from jumping to crazy conclusions, which I am all for.

In conclusion, I find Voyant to be a very useful tool, and even one it could get easy to lose oneself in. However, a lot of Voyant isn’t very intuitive to use, especially features like getting not-so-frequent words to appear on the Trends graph, which involve selecting the word in various other places like a complicated game of leapfrog. Also, the exporting the information was the bane of my existence for this exercise, as exporting a Trends graph didn’t always export it as a Trends graph for “house” in Virginia, but for the whole corpus, which wasn’t selected in the first place! Oh well. Look at your work before you export, kids.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.