Media Clouds

August 17, 2009 · Posted in Stuff 

It’s time. In my last post I mentioned Danny’s discovery of Wordle, and after playing around with it for a few minutes I started wondering what would happen if you fed a whole news site at it. Would it be possible to quantify how much attention a story gets? Better yet, would it be possible to quantify the language used by different media sites if they all ran similar stories and you could compare the coverage? What type of stories does a news site prefer over its competitor?

Wordle may not be able to answer these questions, but perhaps it will provide a starting point. Wordle’s function is to absorb whatever text you throw at it, determine what words appear the most, and then create something that is at once pleasing to the eye, and full of useful information. Words that get repeated are made proportionally bigger, and since we’re visual creatures, the results may speak louder than a simple word count.

Of course, the news doesn’t stand still, so it trying to find answers from only a single day of stories would be inaccurate. Using five days worth of material would be better, and though it would probably be better still to take a whole year worth of samples (slightly difficult with the constant 24-hour news cycle), five days seemed like a sane way to start before committing to a schedule of daily copy-and-pastes. The following are tag clouds generated from five days worth of news site front pages, July 13th, 14th, 16th, 21st, and 23rd.

http://news.bbc.co.uk/

http://news.bbc.co.uk/

http://www.cnn.com/

http://www.cnn.com/

http://www.foxnews.com/

http://www.foxnews.com/

http://www.theglobeandmail.com/

http://www.theglobeandmail.com/

http://news.google.com/

http://news.google.com/

http://www.thestar.com/

http://www.thestar.com/

http://www.torontosun.com/

http://www.torontosun.com/

It may seem redundant to caption these, what with CNN’s tag cloud proclaiming itself so loudly, but look again at Google’s news aggregator and guessing becomes a bit more difficult: we’ve got words like washington suspiciously close to post, or new, york, and times swirling about, and links to a half-dozen other sources. It makes sense, since it’s an aggregator, but good lucking guessing whose without the caption.

There are plenty of things to be interested about. I tried to match each tag cloud’s colour scheme to that of its parent website, and it turns out that black and white and red all over is popular, with the occasional hint of blue. Here I was thinking that the Toronto Sun had nothing in common with the Globe and Mail, but there they are agreeing on something. Also of interest is that the BBC’s website displays more international sections on its front page than any of the other sites. Whereas many websites have a world section, the BBC’s website and tag cloud list Africa, Americas, Asia-Pacific, Europe, Middle East, and South Asia at the very front.

As for favoured stories? Obama, Sotomayor, and Jackson get plenty of attention on CNN and Fox’s front page (despite the latter being three weeks deceased at the time), yet these three barely register on the Canadian sites’ tag clouds. Is this because the stories weren’t covered as much by the Canadian sites, or was it because the Canadian sites varied their stories enough to suppress any from rising to the top? Or, do CNN and Fox vary their coverage just as often, but also put more stories on their front page? Another puzzle is the Globe and Mail’s tag cloud, which at first glance looks mostly like titles and headers with little story content. Is this another case of wide variety in content, or another case of little content to begin with?

Well, get analysing!

Comments

4 Responses to “Media Clouds”

  1. Leora on August 17th, 2009 9:18 pm

    GAH! This is awesome! I find it bizarre how prominent “sudoku” is displayed in the Globe and Mail Wordle. “Sunshine” for the Toronto Sun surely must be a reference to the Sunshine Girl.

    Interesting analysis, Byron! I am going to have to share this.

  2. Leora on August 17th, 2009 9:34 pm

    Also, I agree; the reason all those terms like “washington” and “post” etc. are coming up on Google is because it is an aggregator. There are so many other stories/blogs that are just reporting “The Washington Post said….” and then linking to the story, that it skews that data.

    I’m actually quite surprised, but in retrospect, the results make sense. I would have thought that Google would actually paint a more accurate picture of subjects being covered in the news, because it is a quasi-objective aggregator, as opposed to a source of infotainment, replete with ads and self-promotion. WOW. My Macbook did not recognize “infotainment” as being red underline-worthy, while it did with “aggregator”.

    I’m going to go sob now.

    -Twin

  3. Danny Fekete on August 22nd, 2009 2:07 am

    Hoya, Byron. I love looking at this gallery.

    I’ve posted a response and some proposals for a bigger project at Philomathy.org, in case the trackback doesn’t go through.

  4. [...] the visceral impact without wholly discarding the medium.  His write-up and gallery are available here.  I really, really like the decision to match the colour schemes of the papers to the Wordle [...]

Leave a Reply