-
twitter word frequency
Hi,
I heard mention (somewhere) that someone was going to do a Russian word frequency count using twitter, but I can't find mention of it anymore.
Does anyone know if this is being done? Or did I dream it!? Seems like a good idea, though; it's probably the best source of transcripts of everyday conversational stuff out there to learn from.
Rich B
-
I don't think it's technically possible without having access to their full database of tweets in Russian.
You would have to follow everybody who tweets in Russian.
-
That's a shame, it probably won't happen then!
-
someone's doing it in Spanish:
I'm working on a 400 million word corpus of English tweets from Twitter, as well as 100-200 million from Spanish and Portuguese.
Mark Davies: Corpus Linguistics, BYU
-
I wonder how he collected that corpus. Did he collect it in one batch during a short period of time or in small batches during a long time?
Did he follow the tweets of all English users or just selected one? And which users/topics did he follow?
-
I have no idea, my guess is he somehow got access to their data base of tweets and just took in one load into some program to filter and analyze it.