Finding satisfaction in stratification

CAMBRIDGE, Mass.—Selventa, a personalized healthcare company focused on stratification of patients and development of predictive biomarker panels based on disease-driving mechanisms, recently formed a strategic scientific alliance with Linguamatics, a software solutions company that provides knowledge extraction through its I2E natural language processing (NLP) text-mining platform.

The idea is to combine the analytical capabilities of both companies to efficiently extract complex life science knowledge in a computable, structured, biological expression language format that can be used to interpret large-scale experimental data in the context of published literature.

That format is BEL, a structured language designed to represent scientific findings in a computable form with supporting contextual information, such as tissue, disease, species and publication. BEL is use-neutral, notes Dr. David Milward, chief technology officer at Linguamatics, and it articulates an idea in a manner that is “unambiguous, terse and conveys the facts and associated contexts without loss or ambiguity.” BEL, along with the BEL Framework, is available through a portal to the scientific community to promote the collection, sharing and interchange of structured scientific knowledge. Selventa's discovery platform operates on top of a scientific knowledge base made up of a set of BEL statements.

“The Selventa and Linguamatics collaboration shows how precise, detailed information can be automatically extracted from the literature and provided in a format suitable for further analysis and reasoning,” says Milward. “This will allow reuse of knowledge from the literature, at greater scale and speed.” Much of the business and the future of Selventa are tied into biomarker discovery and personalized healthcare, David de Graaf, CEO of Selventa, tells ddn.

“The way we get there is having qualified knowledge available to us and to other users and comparing it to patient data sets. It’s a matter of pulling together prior knowledge in a usable manner with the analytics on top,” he says.

He notes that well-structured knowledge is already being customized within the scientific realm by organizations that are able to generate knowledge bases in specific areas, but in many cases they are using resources in China and India, “and we can’t directly compete with that,” he adds. “But well-quantified and organized knowledge is something our clients need and that drew us to Linguamatics so that this general knowledge out there could be put in a more terse and usable form.”

By using NLP-based capabilities to efficiently identify and extract relationships hidden in unstructured text and generate structured data for comprehensive biological investigation and analysis, the I2E platform is said to offer dramatically increased speed, scale and reproducibility, and the possibility to efficiently go back into a textual data source to pull out additional information that has become relevant.

“This partnership is a great strategic fit to facilitate the representation of complex biological knowledge that can be recycled and maximized through our analytical platform,” said de Graaf in the news release about the deal. “Collaborating with Linguamatics will enable rapid yet comprehensive investigation of new areas of biology by extracting computable knowledge from unstructured text. This will lead to innovation on many fronts, such as next-generation sequencing, where well-structured information for reasoning has been limited.”

One of primary goals for Selventa in this partnership is to be able to stratify patients using biomarkers. “We’re doing a lot of this kind of work through the BEL initiative with Pfizer and along with Linguamatics as well, de Graaf says. He tells ddn that his company has also talked to other people in the knowledge space, whether publishers or other makers of knowledge bases, to get the best data possible.

“When customers acquire their data sets they often acquire broad assets that are shallow. They cover a lot of territory, like everything about clinical trials or a particular kind of chemistry, but what they don’t do is get what they often really need, which might be everything relevant in a particular area, like multiple sclerosis or breast cancer,” de Graaf says. “So you want to go relatively narrow but much deeper by integrating resources, and this is where Linguamatics helps us meet customers’ needs. We have a set of analytic tools that analyze prior experimental data and compare to your current set of experiments, and Linguamatics provides us with a platform for generating well-quantified knowledge from that.”

Having worked at companies like AstraZeneca and Boehringer Ingelheim, de Graaf had previously been involved in the evaluation and acquisition of NLP tools, and says that he came into Selventa already knowing folks at Linguamatics, “and far as I’m concerned, they are a premier provider of NLP solutions,” he says. “Unlike with other NLP platforms, where they are not expandable, Linguamatics meets our needs because its flexible—we knew our platform would require lots of tweaking, and knocking on their door to get technology that could handle that was just logical.”

Looking toward the future, de Graaf says he plans to work with the “best of the best” to implement not only a set of tools to feed into BEL but also to discover and analyze biomarkers, better stratify patients and more.