A Topic Analysis of "The Nagasaki Shipping List and Advertiser"
Tomonari Masada (Sep. 4, 2012)
( "The Nagasaki Shipping List and Advertiser" is available at
Strip non-alphabetical and non-numerical characters off from the head and the tail of each word token.
Remove stop words and the words of length one.
Run a collapsed Gibbs sampling for
Latent Dirichlet Allocation
with 100 topics.
Optimize topic Dirichlet hyperparameters by
Minka's method
Count the number of topic assignments for each pair of topic and word.
Visualize the numbers with D3 bubble chart.
Different colors of circles correspond to different topics.
The words of the same color are often used together for expressing the same topic.
The same word may be used for expressing more than one topics.
A larger circle shows that the word is more often used for expressing the topic corresponding to the circle color.