DNA Data Storage: Revolutionary Technology
Interview with Stéphane Lemaire and Pierre Crozet, molecular biology specialists at the Computational and Quantitative Biology Laboratory, and co-founders of the startup Biomemory.
The storage of global data has become a crucial issue in our society. Stéphane Lemaire and Pierre Crozet, molecular biology specialists at the Computational and Quantitative Biology Laboratory, are developing an innovative technology to address it: DNA storage.
At a special press conference organized by Sorbonne University on Tuesday, November 23, at the National Archives Museum, the research team presented its "DNA Revolution" project. This proof of concept made it possible to encode, onto DNA, two texts with strong symbolic and historical value for France: Déclaration des droits de l’homme et du citoyen de 1789 and the Déclaration des droits de la femme et de la citoyenne d'Olympe de Gouges (the Declaration of the Rights of Man and of the Citizen of 1789 and the Declaration of the Rights of Women and of the Citizen by Olympe de Gouges (respectively). The National Archives has officially registered the deposit of these documents stored on DNA, a world first for a public institution.
How has data storage become a major, and controversial issue of the 21st century?
Stéphane Lemaire :Today, the world's data represents 45 Zettabytes1 (Zo), or 45 billion billion kilobytes, and will increase to 175 Zo within three years. This dizzying increase is linked to the digital transformation of our society and in particular to the emergence of the Internet of Things, quantum computing, autonomous cars, industrial transformation and the development of artificial intelligence. Since 2010, we have been living on credit, with demand for storage far exceeding its supply.
While the storage capacities of current media are increasing, they are not growing as fast as data production. These traditional media, such as hard disks, magnetic tapes or Blu-Ray, also remain fragile, bulky and energy consuming. For example, they have to be replaced every five to seven years in data centers, whose electricity consumption alone accounts for two percent of global consumption and whose carbon footprint exceeds that of civil aviation. Data storage is a strategic issue for the economy, the sustainability and the security of our societies, so it requires a technological breakthrough.
What solutions are out there?
Pierre Crozet : For four billion years, there has been a natural form of data storage: DNA. To store genetic information, all living beings use two intertwined molecules made up of four bricks: the nucleotides, symbolized by the letters A, T, C and G (Adenine, Thymine, Cytosine and Guanine).
We believe that DNA storage is the only reasonable technology to replace current archival media. It can be stored for hundreds of thousands of years without any energy input, if it is preserved from water, air and light. It is also a million times more compact than any conventional media. With a density of 450 million TB2 per gram of DNA, all the world's data could fit into the volume of a chocolate bar.
What is DNA storage?
DNA storage is the transformation of binary numerical data (0 or 1) into letters corresponding to the four bricks of DNA (A, T, C, G). The sequence of nucleotides is then synthesized on DNA fragments that can be stored on paper, in a tube, in a metal capsule, etc. The stored information can then be read using DNA sequencers, similar to those used in biology and medicine for genome sequencing. Once the succession of letters has been obtained, all that remains to be done is to convert it back into binary data, using the same code that was used to write it, in order to recover the digital information.
The idea of using DNA as a medium for digital information was first put forward in 1959 by Richard Feynman, winner of the 1965 Nobel Prize in Physics. How do you plan to further innovate this idea?
S. L. : The first significant demonstration of storage on DNA was made at Harvard, in 2012, by Georges Church. Since then, several teams have developed it. But until now, this storage was only done on small molecules of about 200 nucleotides that make up a single strand of DNA. These small fragments cannot be manipulated in living organisms that use large double-stranded sequences, such as chromosomes.
We therefore decided to go further by using synthetic biology techniques3 that we have developed in our research, such as a system allowing us to standardize DNA bricks to combine them more easily. We chose to mimic the living world and to adapt its technologies through a bio-inspired solution. This solution, which we have called DNA Drive, consists of assembling the small DNA fragments that have been synthesized from digital information to make long double-stranded molecules that are biocompatible, i.e. that can be manipulated by living cells. The large molecules thus obtained can be integrated into a bacterium, which will naturally duplicate the DNA and the information it carries. In a very short time, it is possible to obtain 100 billion copies of the file at a very low cost.
Do these DNA molecules pose a risk to the environment?
P. C. : The DNA molecules are biosecured, i.e. made unreadable for life: the DNA is encrypted so that it does not carry any genetic information that could be dangerous for humans or the environment. It is then extracted from the bacteria and stored in a stainless steel capsule. Each capsule can contain a quantity of DNA corresponding to 5000 TB of digital data.
To recover the data, the DNA must be rehydrated and the sequence read by a sequencer. The algorithm we have developed allows us to recover the digital information, which is then decompressed to find the original files. Our technology, DNA Drive, allows us to physically organize the data like a hard disk and to encode all types of digital files such as media, folders, computer programs, and others.
How did you come up with the idea to take on such a large project?
S. L. : In 2018, students at Alma mater Association wrote an article in their newspaper about DNA storage. After telling them that my team could master this kind of technology, they challenged me to encode the Declaration of the Rights of Man and the Citizen. It was an excellent idea, to which I wanted to add the Declaration of the Rights of Women and the Citizen by Olympe de Gouges. With Pierre Crozet, we launched the "DNA revolution" project to encode these two founding texts and obtain a proof of concept of the effectiveness of our technology.
Was this a multidisciplinary project? Who did you collaborate with?
P. C. : Absolutely. In addition to collaborating with bioinformatics engineers, we met with historians to find out which version of the Declaration of the Rights of Women and the Citizen we should encode. We then set up a partnership with the French National Archives to officially record these two texts encoded on DNA, a world first for a public institution. The capsules containing the two texts will be stored in the iron cabinet that contains the most precious documents of the National Archives, such as the will of Louis XIV, the diary of Louis XVI and the manuscript of the Declaration of the Rights of Man and the Citizen.
In parallel, we also worked with American company Twist Bioscience, who synthesized the DNA fragments. We then assembled and organized them on large DNA molecules. We biologically amplified these molecules before extracting and purifying them. French company Imagene took care of encapsulating the DNA molecules.
This project has become a real entrepreneurial adventure, hasn't it?
S. L. : In 2019, we designed and patented the DNA Drive technology in partnership with Sorbonne University, the CNRS and Satt Lutech. Then, in July 2021, we founded our start-up Biomemory, with entrepreneur Erfane Arwani. Since then, we've been dividing our time between the development of the company and the continuation of our academic activities. We remain first and foremost biologists.
Today, we continue to be supported and accompanied by the University, the CNRS and Satt Lutech, but also by external investors, including data storage entities.
In 2021, we won the I-lab innovation competition. This recognition attests to the viability of our project, both in economic and scientific terms. In particular, we were evaluated by one of the world's leading experts in DNA data storage.
Today, we continue to be supported and accompanied by the university, CNRS and Satt Lutech, but also by external investors, including data storage entities.
We'll continue to improve DNA Drive technology through synthetic biology. We believe that this discipline will change the world, just as synthetic chemistry did two centuries ago.
1 1021 bytes. It would take 2.5 million years to download one zettabyte with a fiber optic internet connection.
2 One terabyte corresponds to 1012 bytes.
3 Since the start of the 21st century, biological synthesis has made it possible to design innovative biological systems to answer fundamental questions or create new applications, as chemistry did in the 20th century: biofuels, biotextiles, bioplastics, biomaterials, new therapeutic solutions, digital information storage on DNA, etc.
Photo credit: Capsules containing the two texts encoded on DNA. Stéphane Lemaire / CNRS - Sorbonne University