Easily Solved by Humans: An Interview with Phylo Co-Creator Jérôme Waldispühl

Several months ago I started playing an online game called Phylo. It is both simple and challenging, and I can feel virtuous as I play, since (according to its website) the gameplay is helping scientists “decipher our DNA and identify new genes.” Recently it occurred to me that, in addition to being a fun game with scientific value, it might be worth looking at Phylo from a science communication angle.

After all, Phylo’s creators had to make the game engaging in order to get people to play it. The creators also had to raise public awareness about the game. And aren’t games a great way to get people interested in science?

Phylo was created by Jérôme Waldispühl and Mathieu Blanchette of McGill University in Montreal, Quebec. I recently had the opportunity to ask Waldispühl about the genesis of Phylo, what he and Blanchette did to spread the word about the game and any tips he has for future science-game designers.

His answers weren’t always what I expected (spreading the word was apparently a piece of cake), but it’s an interesting story.

Communication Breakdown: Both you and Mathieu Blanchette are on faculty in McGill’s School of Computer Science. Can you explain your interest in genetics and microbiology? What drew you to doing research at the nexus of computer science and life science?

Jérôme Waldispühl: I was originally trained as a mathematician and algorithmician. For my Ph.D. (in France at Ecole Poytechnique), I wanted to work on interdisciplinary research and use my background to advance knowledge in other research fields. Bioinformatics was still new but was getting increasingly popular at this time and I decided to work in this area.

CB: When did you first get the idea to create Phylo? What led to that idea?

Waldispühl: During my postdoc at MIT, I used to talk of games with purposes and human-based computing with my officemate (Luis Sarmenta, who’s working at Nokia now and is currently a collaborator of Phylo). We were inspired by the work of Luis von Ahn, who developed (between others) the ESP games and reCAPTCHA. My main area of research is bioinformatics and naturally we had the idea to develop something related to biology. But a few months after we started to work on this Foldit went out and we gave up with this idea. When I arrived at McGill in September 2009 I realized that the same idea can be applied to sequence alignments and discussed this idea with Mathieu who’s a specialist of comparative genomics. During the summer 2010, we hired two McGill undergraduate students, Alex Kawrykow and Gary Roumanis, to implement the first interface and we officially released it on November 29, 2010. A new version of the game implemented by another McGill undergraduate, Alfred Kam, has been released on December 3, 2012.

CB: How would you explain the game to someone who hadn’t played it?

Waldispühl: Phylo is a casual web game that looks like a classical puzzle game such as Tetris. By moving the puzzles pieces of various colors horizontally, the players have to vertically align the colors. This gives you a bonus. [However,] the sequences have different sizes and you need to introduce some gaps and sometimes color mismatches. These give you a penalty. The game is to find the best trade-off: maximum bonuses and minimum penalties.

CB: I’ve read the “about” page on the Phylo site, but it’s probably fairly confusing to many readers. For example, most people don’t know what the word “heuristics” means. (I had to look it up.) How would you explain its scientific value to people who aren’t biologists or computer scientists?

Waldispühl: The alignment of genomes is important to identify regions that have been conserved during evolution between species. If some regions are identical between all species, there’s probably a reason for that. One of the possibilities is that these regions are functionally important and thus that a mutation in these regions may create a metabolic disorder. Some genes have been identified to be involved in various diseases. Here, we try to improve their alignments and identify conserved regions in these genes.
Mathieu Blanchette (left) and Jérôme Waldispühl
Mathieu Blanchette (left) and Jérôme Waldispühl
Our goal is to compute accurate genome alignments. Finding the best sequence alignment is a computationally hard problem. This implies that it becomes very quickly intractable of interesting problems. That means it becomes very quickly intractable when the size of a problem grows. It is relatively fast to align two or three small sequences, but if you add more sequences, or use longer ones, then the time needed to complete the problem increases a lot until it becomes intractable. Interesting problems fall in this category of big problems requiring a lot of computational time.

To overcome these limitations, computers resort to heuristics, but the problem is that the solution returned is most likely not optimal.

CB: I’m sorry to interrupt, but can you explain what heuristics are?

Waldispühl: Heuristics are rules that enable us to solve our problem more quickly. For instance, when finding the best solution requires us to check many cases, we can set up a rule that will select a only few cases to examine. These cases have more chances than the others to contain the best solution. Thus you save time by making less checks. The drawback is that your rule may not guarantee that the best solution belongs to the selected cases. So at the end you’re going faster but may lose accuracy.

CB: Thanks. Now, back to explaining what Phylo does?

Waldispühl: With Phylo, we aim to improve the alignments that have been pre-computed by algorithms. It works because the sequence alignment problem can be presented as a color-matching problem that can be “easily” solved by humans. The reason for that is that the human brain has evolved to compute visual patterns very quickly, and is currently still faster than any computer.

However, genomes are huge (millions of bases). We cannot ask a single user to align it. In fact, we use the following procedure:

– Genomes are first aligned by the best computer programs (we took those available at the UCSC Genome Browser).
– We identify and extract from these giant genome alignments the regions with the lowest quality, and transform them into puzzles.
– We ask web users to play our puzzles on our website.
– The best solutions are collected and re-inserted in the original genome alignment to provide a better alignment.

In theory, we could use computers to find the best alignment. Unfortunately, due to the structure of the problem, it can take too much time to compute the best alignment because it would require checking all possible alignments. On the contrary, humans can have a better intuition of what is the best solution. Instead of trying all the solutions, they can “guess” what the best solution looks like and go straight to that point without looking at all possible solutions. That’s a short story I can develop.

However, computers are still required to proceed to the pre-alignment of genomes and to identify the regions that need a closer look (i.e. our puzzles!). Algorithms are also needed to re-insert the puzzles in the complete genome alignment. So at the end, Phylo is more a combination of the best of humans and computers. Computers do the bulk of the work, treating massive data, and humans become involved for the “precision” work when the classical algorithms fail to align the sequences properly.

CB: Were there any problems in the first version (or versions) that you had to work out?

Waldispühl: The first version was implemented in Flash. Unfortunately, this language is not suited to tablets. Thus we re-implemented the game in HTML5 for better portability.

The players didn’t like too much the timer of the original version. We decided to remove it and redesign the game and the scoring system to accommodate this change.

CB: If you want people to play a game, you have to make the game fun. What steps did you take to ensure that Phylo would be fun and engaging for players?

Waldispühl: First, we designed having in mind games like Tetris. Then, we tested and put online several beta versions between August and October 2010, and asked students at McGill and friends to provide us feedback.

CB: No matter how much fun a game is, no one can play it if they don’t know it exists. What did you do to make the public aware of Phylo?

Waldispühl: Interestingly, nothing very particular. We contacted the news service at McGill and published a press release. One hour after the release, we were contacted by several journalists and the announcement of our game spread on blogs and social networks. There was literally a post every minute, maybe seconds, about Phylo on Twitter that day. The traffic was much higher than we anticipated and the server crashed. In the emergency, our system staff managed to fix it quite quickly and ensure the success of the release.

CB: Phylo has been covered by Wired, Fast Company and PRI’s The World. Looking at the images from some of those stories, it’s clear that Phylo has gone through some changes. How has Phylo evolved since you first launched it? Are the changes solely to the user interface, or have you also made changes to the game to make it more valuable as a scientific tool?

Waldispühl: Both. As said earlier, we re-implemented it using a programming language that will allow Phylo to run on mobile devices such as tablets. Then, we added translations of the games (made by players who volunteered) in eight new languages: Chinese (simplified & traditional), German, Hebrew, Portuguese, Romanian, Russian and Spanish – in addition to the original French and English. We also enabled social login, via Facebook, Twitter, Google or LinkedIn, and social share, so users can post their performances on their news feeds.

More importantly, we developed a new interface for expert players, who have played more than 20 puzzles. The latter enables [users] to play much bigger alignments and then solve complex problems and make a better contribution.

CB: To date, how many people have played Phylo?

Waldispühl: Since its launch in November 2010, we have collected about 700,000 solutions with approximately 29,000 registered players. Please note that there’s one big difference [between Phylo and] other citizen science games: you can play Phylo anonymously. Many more people play anonymously. If you need an estimate, I would say that between 100,000 and 200,000 people have played Phylo since its initial launch. This is important because Phylo has been designed to be a casual game whose results are used to generate better alignments. Players do not have to get into the science to make a significant contribution. Of course, they can also learn about it if they want!

CB: How successful has Phylo been at helping to shed light on complex genetic questions? Are there examples of any instances where it has helped advance specific research efforts?

Waldispühl: We do not have any particular example right now. Phylo aims to provide the data needed by geneticists to study our genomes. The data [used in] Phylo are at the beginning of the pipeline.

Although we are currently working with cancer-related genes and are trying to provide examples that could help to illustrate the usefulness of Phylo.

CB: What advice do you have for anyone who is interested in developing games to help address research questions?

Waldispühl: I do not like too much to give advice, but from my experience I would say: 1) Apply these techniques if the problem you tackle is interesting for the broad public; 2) Do not neglect the game design. Before being useful, a game should be fun.

This interview was conducted by Matt Shipman and posted on Science Communication Breakdown
https://sciencecommunicationbreakdown.wordpress.com/2013/08/19/phylo/