What is Toluu?
Toluu is a free service for sharing the feeds you read and discovering new ones.
Get Invite

Neoformix

Discovering and Illustrating Patterns in Data


CES Clustered Word CloudYesterday

Yesterday it was Macworld, today it's the Consumer and Electronics Show (CES) going on in Las Vegas.

cwc_CES2009.png
MacWorld Clustered Word CloudJanuary 7

Macworld has been attracting a lot of attention the last few days. I've taken the last 10,000 tweets that mention it and created a Clustered Word Cloud. The primary themes of the conference do seem to emerge from the cloud.

cwc_MacWorld2009.png
World News Clustered Word CloudJanuary 7

The graphic below shows a Clustered Word Cloud for the world news headlines from 2008. As in my last post, the data comes from the Toronto Star so it comes from a Canadian perspective. Several groups of keywords bear this out including the second largest (in red) which shows there was a lot of coverage about Canadian soldiers killed or injured in southern Afghanistan. The largest cluster by far (light blue) shows that the US presidential campaign received a lot of coverage. The automated clustering did produce the unusual grouping of 'Korea' with 'Carolina', 'primary', and 'victory'. They were linked through frequent use of 'North' and 'South' as in 'North Korea' and 'North Carolina'.

By grouping related words this technique does a much better job of summarizing the most covered international events than the Streamgraph representation. However, in order to do so it sacrifices any attempt at showing the distribution of events over time. Perhaps some combination of these two ideas would be fruitful.

cwc_WorldNews2008.png
World News StreamgraphJanuary 6

Now that 2008 is over I've been thinking about looking at some datasets for the year. One that I have started to explore is a set of world news headlines from my local paper, the Toronto Star. I used some great information I found in here that shows how to use Google Reader to get the latest RSS entries from any feed. The dataset includes 1311 stories and I looked at both the title and summary text for this analysis.

The image shows two StreamGraphs. The top one in red shows the most common capitolized words and when they appeared during the year. The blue StreamGraph shows the popular non-capitolized words over the same time period. The graphic seems to do a reasonable job showing the primary international news events of the year:

  • Obama throughout most of the year with coverage peaking at election time
  • Wall between Gaza and Egypt in early 2008
  • Tibet in March
  • NATO, Mugabe in March/April
  • China, Burma, cyclone, quake, aid around May
  • Georgia,Russia,Hurricane Gustav in August
  • India,Mumbai, and Pakistan in late November
  • Gaza and Israel again at the end of the year
Click on the image to see a larger version

Thank You and Happy New Year!December 31 2008

Thank you all for your attention to Neoformix during 2008. This weblog primarily showcases my own work and it is gratifying to see how many people are interested. I am excited about the possibilities of the coming year. Best wishes to all of you in 2009 !

Sincerely,

Jeff Clark

Jeff8.png