Genetic Engineering Attribution Challenge

We solicited inventive solutions to a crucial problem in genetic engineering: where was this engineered?
Visit the DrivenData competition page
See Detailed Results

BACKGROUND

Every day, genetic engineering techniques are used to solve critical challenges in agriculture, manufacturing, and medicine. However, as the power of genetic engineering increases, so too does the potential for serious negative consequences if the technology is misused. It is often difficult to trace the origins of a genetically engineered product, making it difficult to ensure due credit for its creators – or to hold them accountable.

At altLabs, we’ve developed tools to identify the original source of engineered DNA – a process known as genetic engineering attribution. Better attribution technology would increase transparency and accountability in synthetic biology, while still promoting and rewarding innovation. To further advance the state of the art in attribution technology, we sponsored the Genetic Engineering Attribution Challenge, a data science competition on the DrivenData competition platform.

THE COMPETITION

The competition consisted of two tracks: the Prediction Track and the Innovation Track. Each track had a total prize pool of $30,000. The Prediction Track ran from August 18 to October 19, 2020, while the Innovation Track ran from October 20 to November 1, 2020.

In the Prediction Track, teams competed to identify the lab-of-origin of engineered DNA sequences with the highest possible accuracy. Prizes were awarded to the teams who could achieve the highest accuracy score.

In the Innovation Track, high-performing teams from the Prediction Track were invited to showcase their approaches to a multidisciplinary panel of expert judges. Prizes were awarded to teams who exhibited novel and creative approaches to the problem, or who demonstrated that their algorithms possessed useful properties other than raw accuracy.

THE RESULTS

More than 300 teams from around the world competed in the Challenge. Top accuracy scores quickly exceeded the previous state of the art: given 10 guesses for each sequence, the best teams were able to predict the source lab of an unfamiliar plasmid DNA sequence almost 95% of the time, compared to 85% in the best published model.

In total, prizes were awarded to six winning teams. Two teams won prizes in both tracks, demonstrating both exceptional accuracy and a creative and compelling approach to the problem. Click here to find out more about the winning teams and their models.

Winning teams adopted a variety of different technical approaches, demonstrating the diversity of methods that can be applied to genetic engineering attribution, as well as the potential for new machine learning approaches to further improve on existing tools.

TIMELINE

  • 2020-08-18: Prediction Track Opens
    Participants accessed competition data and began making submissions.
  • 2020-10-19: Prediction Track Closes
    Top participants invited to submit code for review.
    High-scoring teams invited to submit reports to Innovation Track.
  • 2020-11-01: Judging Begins
    Innovation track submissions pre-screened for quality, then passed to judging panel.
    Judges assessed submissions for originality, technical quality, and engagement with real-world attribution questions.
  • 2020-11-25: Judging Closes
    Judges submitted assessments to altLabs.
    Innovation Track winners determined based on assessment scores.
  • 2020-12 to 2021-01: Verification
    Code from winning teams checked and prepared for release.
    Performance of winning submissions checked against an out-of-sample verification set.
    Prizes awarded.
  • 2020-01-26: Results Announced
    Competition results announced publicly by altLabs, DrivenData, and partners.

PARTICIPATING ORGANIZATIONS

Host:
DrivenData

Frequently Asked Questions

The final results of the Genetic Engineering Attribution Challenge have been announced! Click here to see the detailed results.