Or was it a plain text file. Another question: although MALLET is unconcerned with word meanings, instead focussing on patterns of word usage, elsevier scopus does it overcome the problem of text that predates standardized spelling, punctuation, and grammar. Could it handle texts that were authored by numerous people over time, each of whom had their particular idiosyncrasies.

Tracking them over time was a matter of naming the txt files by their date, such as 18070225. Big data can overcome a lot of problems. This has particular potential for clustering different authors together. It all probably depends on just how variant the Vira-A (Vidarabine)- FDA idiosyncrasies are from author to author.

In theory, you could also reverse-geocode andrea johnson (or newspapers) to determine based on their content where they were from. Since you know the locations of newspapers, it might be an interesting way to test this idea.

It would be interesting, for example, to see if Martha becomes has less EMOTION around DEATH as she gets older. Thanks for the feedback. I really like the idea of reverse-geocoding, especially if you had a known-location training corpus for the program to work with.

Mixed results so far, but it was interesting to see one topic af I was having trouble identifying move almost exactly opposite (coefficient of -0.

Johnson research most of your paper was a bit over my non-quanty humanities head, it was interesting to see the intersection of topic modeling and geographic analysis. Thanks to you and Amylase for introducing MALLET - I found your analysis of the product very interesting.

Thanks for introducing MALLET topic modeling tools. This is exactly the type of research that got me interested in statistical text analysis. For a corpus like this diary, it should work well even with substantial variation. Reblogged this on Austen, Morgan and Me and commented: Detailed blog post exploring the use of MALLET to topic model a diary.

Two topics appear to deal largely with HOUSEWORK. Emotion Like the housework topic, there is a broad increase over time. Reply Cameron Blevins says: April 1, 2010 at 8:56 am Jason, All good questions. I look forward to more cool stuff from this. Please send along my appreciation to the developers of the MALLET team.

