Author ORCID Identifier

Dhundy Bastola

Dario Ghersi

Document Type


Publication Date


Publication Title

BMC Medical Genomics


9 (Suppl 2)




Background: Fragment-based approaches have now become an important component of the drug discovery process. At the same time, pharmaceutical chemists are more often turning to the natural world and its extremely large and diverse collection of natural compounds to discover new leads that can potentially be turned into drugs. In this study we introduce and discuss a computational pipeline to automatically extract statistically overrepresented chemical fragments in therapeutic classes, and search for similar fragments in a large database of natural products. By systematically identifying enriched fragments in therapeutic groups, we are able to extract and focus on few fragments that are likely to be active or structurally important.

Results: We show that several therapeutic classes (including antibacterial, antineoplastic, and drugs active on the cardiovascular system, among others) have enriched fragments that are also found in many natural compounds. Further, our method is able to detect fragments shared by a drug and a natural product even when the global similarity between the two molecules is generally low.

Conclusions: A further development of this computational pipeline is to help predict putative therapeutic activities of natural compounds, and to help identify novel leads for drug discovery.


© 2016 The Author(s). Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated.

DOI: 10.1186/s12920-016-0205-6

Creative Commons License

Creative Commons Attribution 4.0 License
This work is licensed under a Creative Commons Attribution 4.0 License.