How Open Data for Science Will Change How Businesses Compete

Today, even an ordinary teenager with a smartphone has almost godlike power over information. With a few swipes and clicks, anybody can access the world’s information, use advanced tools to analyze its meaning and share it with anyone else. That’s really changed how we innovate.

So it’s strange that the practice of science has, for the most part, been stuck in the dark ages. The process of research, peer review and publication remains almost as slow and cumbersome as it was decades ago, which hinders our ability to turn new discoveries into useful applications.

That may be changing though. Taking a page from the open source movement, there are a number of efforts underway to aggregate the latest knowledge and make it available to anyone who wants to use it. From cancer research and materials science to psychological profiles, these new data sets will enable and empower innovation like never before.

A Periodic Table For Cancer

“We said, ‘Let’s gather data along with some basic analysis, publish it and allow the scientific community to study it,’” Jean Claude Zenklusen a biologist at NCI told me. “We did this because we believed by releasing the data in this way, we could tap into the collective expertise of thousands of researchers across a number of fields and accelerate innovation.”

This approach formed the basis for The Cancer Genome Atlas (TCGA), a joint project between NCI and the National Human Genome Research Institute, which began in 2006 and has since sequenced the tumors of over 10,000 patients encompassing 33 types of cancer. “Cancer data has now become open data,” Zenklusen told me proudly.

Today, a decade later, its effect on cancer science has been profound. “It essentially gives us a periodic table,” Ron DePinho, President of MD Anderson Cancer Center says, which has provided us with both diagnostic and therapeutic value as well as helped us design clinical trials to accelerate the development of new cancer drugs,”.

Yet the impact of the program goes far beyond major institutions like MD Anderson. Much like the original periodic table, it has greatly democratized scientific knowledge. Many of the researchers who use the data are first-time grantees from small institutions who likely wouldn’t have gotten their studies off the ground without TCGA as a resource.

A Genome For 21st Century Manufacturing

Traditionally, the way you improved a product has been a process of trial and error. You changed the ingredients or the process by which you made it and saw what happened. For example, at some point a medieval blacksmith figured out that annealing iron would make better swords.

Yet today, coming up with better materials is a multi-billion business. Consider a car maker that wants to improve fuel economy. It could use a smaller, less powerful engine, but that would sacrifice performance. So a much better solution would be to figure out how to make a lighter material that is strong enough to not compromise safety.

With this in mind, the Materials Genome Initiative is building databases of material properties like strength, density and other things, and also includes computer models to predict what processes will achieve the qualities a manufacturer is looking for. Like The Cancer Genome Atlas, it is making the data available to anyone who can find a use for it.

“Our goal is to speed up the development of new materials by making clear the relationship between materials, how they are processed and what properties are likely to result,” Jim Warren, Director of the Materials Genome program told me. “My hope is that the Materials Genome will accelerate innovation in just about every industry America competes in.”

Genomes Of The Mind

IBM Fellow and Vice President of Healthcare and Life Sciences Research Ajay Royyuru, however, thinks artificial intelligence can go even further and help us understand the most complex entity on the planet — ourselves. “Language is a means to transfer cognitive state,” he told me. “While I’m talking to you, I’m effectively trying to make a link between what’s going on in my mind with what’s going on in yours.”

So his team at IBM’s Healthcare and Life Science division began studying chess players to see if they could find a correlation between their brain activity and their proficiency. Indeed, they found that they could. They later had similar success with evaluating musicians. Now IBM is working on a system that could evaluate mental health through language processing.

“Our hope is that this technology, when combined with the expertise of a trained therapist, can help recognize early indications of mental illnesses and enable the opportunity for more effective treatment before more acute symptoms present themselves,” Royyuru says.

A New Era Of Mass Collaboration

The traditional practice of science reflected these realities. To do any significant research, you had to get a budget or a government grant for which you must make your purpose clear. Anything you find that aligns with that stated purpose gets published, but most of what doesn’t tends to be discarded or lost in a notebook or hard drive somewhere.

Yet with the cost of storage and search now negligible, the economics of science are changing. “Having huge amounts of data becomes much more interesting when we can classify it in some way and can even be the first step towards creating a generalized model, which drives further innovation,” the complexity theorist Samuel Arbesman told me.

Essentially, when scientific data becomes open data, the power of fundamental research becomes available to just about anyone with an idea. You no longer need a billion dollar budget to make a breakthrough, but can use the collective knowledge of the world’s scientists to imagine a new future.

An earlier version of this article first appeared in



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Greg Satell

Bestselling Author of Cascades and Mapping Innovation, @HBR Contributor, - Learn more at — note: I use Amazon Affiliate links for books.