I saw this question a few days ago, but just saw you haven't gotten any answers yet. I know there are some tools out there, but I don't personally have any experience with any of them.
What does your source data look like? What are your requirements for parsing/cleansing?
A relatively easy and free way of doing word cloud analysis in Tableau is to use a tool like Notepad++ and do a quick find of spaces and punctuation and replace them with newlines. That gets a single word on each line. Use that as the data source. Number of Records gives you the word count.
I used a find and replace with regular expressions:
It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, it was the epoch of belief, it was the epoch of incredulity, it was the season of Light, it was the season of Darkness, it was the spring of hope, it was the winter of despair, we had everything before us, we had nothing before us, we were all going direct to Heaven, we were all going direct the other way...
which I took into Tableau (calculated the lowercase of each word, filtered out some common words, etc...)
Words.twbx.zip 59.1 KB
There are also tools in Excel to create word frequencies. I have yet to find one though that removes "the, a, very" etc. I would guess that you would want to end up with a list of just nouns and verbs??
Please take a look at natural language toolkit - Natural Language Toolkit — NLTK 2.0 documentation
I would recommend you reading this article, it's somehow related with your initial post.
Cristian, that is an amazing resource - I love the concept of individual word clouds showing verbs, nouns, and unique words between the candidates.
You get more options from this page: CST's online-værktøjer. My favorite is the POS tagger.
Some time ago I used another free (Danish) language analysis tool, but can't find it.
I am not sure if it was from the same university as the link you (and I) shared.
Ps. Danish is my second language, whereas English is my third language. (Faroese is my first language)
But if you switch the analysis language to English (assuming that's what you want), many disappear
Yes, several options disappear when choosing English (engelsk).
Another issue is that the language switches back to Danish after pressing [Submit my text].
The POS tagging is neither perfect, because switch is a verb in your sentence, and not a noun as I assume NN represents. That said, I still think this is a tool with great potential.
Completely agree - I've been looking for a tool like this for a long time. The downloadable version has the code, but does not seem to have the dictionary.