Aarhus University Seal

PhD Project: Decoding Danish YouTube Habits - Analyzing Danish People’s YouTube Watch Histories

Almost 50% of Danes use YouTube regularly. However, we know very little about what content they watch on the American platform, and we know even less about, how this content influences their thinking. David Wegmann tries to change that. In his project ‘Online video content at scale – an analysis of influence on YouTube’ the PhD student at DATALAB develops new methods for analyzing large amounts of video data. The goal of this research is to better understand the role of YouTube and other video platforms in our democratic processes and society. An unexpected benefit in this endeavor: His background in biology.

What is your PhD project about?

“It’s about making sense of what Danish people have watched on YouTube. My project, which is a part of DATALABs social media influence project, is about developing a way to understand how YouTube influences people’s opinions. I do this by looking at the YouTube watch histories donated by 1000 Danes in relation to the social media influence project."

"Since it’s pretty much impossible for one person to sit down and watch all the YouTube videos 1000 individuals have watched, I’m developing a method for computationally make sense of this data. YouTube is such a widespread social media platform. That makes it’s important to figure out what effect watching YouTube can have on people's opinions.”

 

How did you get interested in this subject?

“I’ve always been a science nerd. I went from wanting to be a journalist who reported on something science-related – which is why I ended up studying Biology and Linguistics –  to being a scientist investigating journalism and media consumption."

"I'm generally interested in all types of media. YouTube videos are specifically interesting to me, because if you think about it, it’s interesting that so many people choose to watch YouTube videos often made by amateurs when they could be watching for example, a professionally produced TV show.”

 

What are you working on at the moment?

“I’m currently developing an algorithmic pipeline for downloading and analyzing YouTube videos. I’m essentially figuring out a way to make sense of the data. Because of GDPR legislation, every EU citizen is able to retrieve their YouTube data, including their watch histories, making studies like this possible. The difficult part right now is downloading all the YouTube videos themselves for analysis, because YouTube is very strict when it comes to downloading the video files from their website.”

 

What findings from your project do you find the most interesting or surprising so far?

“One thing I’ve found is that having access to a person’s YouTube data paints a surprisingly incomplete picture of that person. The data has many holes, and it is unclear which videos have really been watched and which were just clicked on. We always talk about YouTube, which is part of Google, as this surveillance giant, who knows everything about you. This is not to say that YouTube doesn't know a lot about people who use their platform, but that on a bigger scale, this knowledge probably comes more from our overall behavior on the platform than the specific list of videos we watch. I found that quite eye-opening."

 

What do you look forward to working with in the future?

“The aim is to develop a method for large scale video analysis not only applicable to YouTube, but to other video platforms as well. Recently I’ve thought about this in terms of my background in Biology. In biology, taxonomies of species used to be created by looking at their phenotype – their outwards appearance. Now they are made based on a molecular analysis of a species genotype, its genetic make-up."

"At the moment, the only way of analyzing videos we have is to look at their phenotype, which in this context means a person sitting down, watching one video at a time, and analyzing it. What I want to do is help develop a way to look at the genotype of videos, essentially looking at the information stored in a video, i.e. the genes. By feeding this information to a computer, the computer can then identify large patterns invisible to humans, thereby creating its own kind of taxonomy of videos.”