Andrew Duffy

Journalist, community manager, blogger, and (very) amateur Python hacker.

On YouTube, social ≠ popular

I flicked over a Google research paper last week that I think some of you might find interesting. In the study researchers analyse sharing and its relationship to a video’s popularity, and while the whole paper is worth a read, I found the discussion on the ‘socialness’ of popular videos to be the most interesting.

I’ll post the extract below (Section 6.1 if you’re interested) but the key takeaways from the discussion are:

1) Not all popular videos are highly social

2) Most videos become popular on YouTube through search and related videos (not through sharing/referrals).

3) Viral videos rarely make it into YouTube discovery mechanisms such as search/related videos.

3.1) The data suggests the way YouTube computes related videos does not apply well to viral videos.

Here’s the full extract:

Previous sections of this paper have focused on the full spectrum of YouTube videos. This section focuses on popular videos, which we define to be the top 1% of videos in terms of views. We find that not all popular videos are highly social. The majority of videos become popular through related videos and search. 

Figure 11a shows the distribution of the percentage of social views among popular videos in the first 30 days. Note that the distribution is bimodal. That is, it has two peaks, showing that most videos are either viral (peak around 90%) or non-viral (peak around 10%). The peak at 10% is much higher than the one at 90%. If we consider viral videos those with at least 60% of social views, 23% of the videos in this plot are viral. Figure 11b shows the distribution of percentage of views form YouTube search and related videos. This distribution it is still bimodal but it is much more uniform than the previous one, 37% of the videos have at least 60% of their views coming from YouTube search and related. 

The bimodal distribution (Fig. 11a, b) means that videos have many views that originate either from YouTube or from external websites/sharing. This pattern can be explained by the fact that viral videos do not seem to make it very often into the YouTube discovery mechanisms such us related videos or YouTube search.

Screen Shot 2013-10-10 at 7.51.16 PM

We have a couple of hypotheses to explain that. Related videos rely on co-visitation data1 almost exclusively over a certain period of time. But most viral videos have views in a short period of time and their users are often casual YouTube users (Ulges et al. 2011). These factors may prevent viral videos from making it into the related list of any other videos. On the other hand, videos that make it into the related video list of other videos have a stable source of views; even if it decays, it is sustained for a longer period of time. These hypotheses also suggest that they way we compute related videos today does not apply very well to viral videos.

If you want to read more check out the full paper.

Is there a classy way to content farm?


It’s no secret that content farms are crap-holes, but love em or hate em they can still teach us a thing or two. Let me explain:

Journalists and media pundits like to think that only quality content sells. They think that anything whipped up by a drone (human or otherwise) can’t draw an audience and can’t make money. For the most part that’s true, but I think this approach has the nasty side effect of shutting down some interesting entrepreneurship.

A case in point is Forbes. They’ve been experimenting with machine-writing for some time, and there’s definitely potential to ramp up that approach in other financial news outlets. Whether it’s aggregating earnings forecasts or writing up press releases, most of that stuff is bland and formulaic, and it’s not hard to envisage an algorithm putting it together automatically.

In that scenario journos could spend more time on quality stories without shirking the boring stuff. It would mean more content, more pageviews and more quality. It’s something I’ll be experimenting with over the next few months, and if you’re interested in what I’ll be doing you can get in touch.

Image:  Patrick Haney/Flickr

Why 2013 is a watershed year for community management

Earlier this week I read a great discussion on the problem of online abuse and what publishers are doing to clean up their comments and social chatter. While the article itself was a great read, for me what was even more exciting was the fact that it had made it online in the first place.

Online comments and their quality, or lack thereof, are a perennial topic of debate in media circles, but for me the quality and direction of this article at represented a turning point for the topic. This change did not happen overnight, of course, but it still represents a tectonic shift in the mainstream approach to community management. Twelve months ago the idea that you could clean up even the dirtiest threads (and that this was a worthwhile pursuit) would have been almost laughable at many traditional outlets. But study after study (not to mention common sense and good ol’ fashion experience) has made it clear that discussion/debate is a valuable resource, and cleaning it up is an achievable outcome.

All of this is happening at the same time as several new media heavyweights roll out some exciting initiatives. The most obvious are Nick Denton’s latest tweaks to the Kinja system, which Matthew Ingram has labelled 2013′s most disruptive move in online media. But Denton is not alone. Quartz has also been rethinking how reader comments work, and broader but no less impressive changes have been afoot across a larger swathe of Atlantic Media titles.

There are dozens more I could add to this list, and I think it’s plausible that the culmination of these efforts will put a number on the days of troll infested communities. There is still much work to be done, and it is by no means a problem that can be completely eradicated. Nevertheless community management has reached a crossroads, and all publishers need to make sure they’re part of this new wave.

A few thoughts after making the front page of Buzzfeed


Around a year ago I opened an account on Buzzfeed with the goal of shaping a post that would make the front page, and I succeeded on my first attempt after spending an hour browsing cat videos on YouTube.

It’s hard to draw any wide reaching conclusions from that post, but I still think it hints at a few truths about Buzzfeed and its audience… truths that Jonah Peretti has spent a lot of time and effort obscuring.

Truth 1: Buzzfeed likes to build up mystery and hype surrounding how it writes for its audience, but the cold hard truth is that filtering through a ‘cat videos’ search on YouTube (my exact keyphrase) is sometimes enough to garner 20K+ views and make it onto the homepage.

Truth 2: Data science and research has helped define broad categories that work on Buzzfeed, but there’s no algorithm that dictates its content. Buzzfeed has not ‘cracked the code’ for making viral content… it relies on guesswork and creativity (within well-researched boundaries) to produce its content. Sometimes it works and sometimes it doesn’t, and this is only slightly different to how traditional newsrooms have worked since time immemorial.

I’m not saying it’s always this easy, and compared to the staff posts my cat-post metrics are modest to say the least. But I think Peretti’s rhetoric about the science behind Buzzfeed needs to be recognised for what it is, which is nothing more than a marketing pitch. What worked about my post was not that it had the backing of Buzzfeed’s viral magic, but that it was fresh content that hit the buttons of a niche audience (luck also plays a big part). It’s not easy but it’s not rocket science, and it’s not something that Buzzfeed does better than anyone else.

Image: Scott Beale/ Laughing Squid

Mapping Big Brother’s Twitter chatter


I’ve been working on data mining Twitter for a while now, and while it’s taken a fair amount of blood, sweat, and tears… the results have been worth the effort. Above is a snapshot of the interactions between ~2,000 Twitter users over seven days. It maps out a week of discussion on the #BBAU hashtag (from 2012) and you can explore the full dynamic map online here.

It’s amazing how much more sense online communities make when you visualise them like this. It’s also fascinating to see the how users cluster around influential users to form micro-communities within the broader picture. It goes without saying that this kind of visualisation has great potential, and aside from that it looks pretty damn cool.

As always, if you want to learn more about how I did this get in touch. The layout and colour-coding in particular is something that takes a little bit of time to explain/understand.

Mapping the overlap between mines and native title

I’ve put together an interesting map showing the overlap between native title claims and mining operations. The map is based off data from Geoscience Australia, and it shows active mines, deposits, and historical operations on one layer, and native title claims on the other (you can click on points of interest to get more info).

It’s interesting to note the extent of active native title in Australia. Also of interest are the claims that extend into the ocean above Cape York.

With it’s multiple layers and datasets, this visualisation is starting to test the limits of what Google Maps can achieve. Pretty cool stuff. If you want to know more about how I did it let me know.