LinkedInPankaj Agarwal

Pankaj Agarwal

Director, Systematic Drug Repositioning, Computational Biology at GlaxoSmithKline

Greater Philadelphia Area
  1. SmithKline Beecham,
  2. Washington University,
  3. Washington University School of Medicine
  1. New York University

Join LinkedIn & access Pankaj's full profile

Join LinkedIn & access Pankaj's full profile. It's free!

As a LinkedIn member, you'll join 300 million other professionals who are sharing connections, ideas, and opportunities.

  • See who you know in common
  • Get introduced
  • Contact Pankaj directly
New York University

New York University

Computer Science

View full profile



Director, Systematic Drug Repositioning, Computational Biology

– Present (18 years)

Principal Investigator

SmithKline Beecham
(4 years)

Postdoctoral Research Associate

Washington University
(3 years)

Postdoctoral Research Associate

Washington University School of Medicine
(3 years)


  • Bioinformatics
  • Computational Biology
  • Drug Discovery
  • Systems Biology
  • Genomics
  • Biomarker Discovery
  • Drug Repositioning
  • Translational Medicine
  • Data Mining
  • Genetics
  • Biotechnology
  • Informatics
  • Machine Learning
  • Pharmaceutical Industry
  • Science
  • R&D
  • Pharmacology
  • Sequence Analysis
  • See 3+  See less


Reply to Rational drug repositioning by medical genetics(Link)

Nature Biotechnology
December 2013

Novelty in the target landscape of the pharmaceutical industry(Link)

Nature Reviews Drug Discovery
August 2013

Agarwal and colleagues present an analysis of the overlap in the drug targets that are being pursued across the industry, which indicates that more than half of the novel drug targets in current pipelines are being pursued by only one company.


Computational Drug Repositioning: From Data to Therapeutics(Link)

Clinical Pharmacology & Therapeutics
February 2013

Traditionally, most drugs have been discovered using phenotypic or target-based screens. Subsequently, their indications are often expanded on the basis of clinical observations, providing additional benefit to patients. This review highlights computational techniques for systematic analysis of transcriptomics (Connectivity Map, CMap), side effects, and genetics (genome-wide association study, GWAS) data to generate new hypotheses for additional indications. We also discuss data domains such as electronic health records (EHRs) and phenotypic screening that we consider promising for novel computational repositioning methods.


Use of genome-wide association studies for drug repositioning(Link)

Nat Biotechnol. 2012 Apr 10;30(4):317-20
April 2012

Can literature analysis identify innovation drivers in drug discovery?(Link)

Nat Rev Drug Discov.
November 2009

Drug discovery must be guided not only by medical need and commercial potential, but also by the areas in which new science is creating therapeutic opportunities, such as target identification and the understanding of disease mechanisms. To systematically identify such areas of high scientific activity, we use bibliometrics and related data-mining methods to analyse over half a terabyte of data, including PubMed abstracts, literature citation data and patent filings. These analyses reveal trends in scientific activity related to disease studied at varying levels, down to individual genes and pathways, and provide methods to monitor areas in which scientific advances are likely to create new therapeutic opportunities.


Literature mining in support of drug discovery(Link)

Brief Bioinform
November 2008

The drug discovery enterprise provides strong drivers for data integration. While attention in this arena has tended to focus on integration of primary data from omics and other large platform technologies contributing to drug discovery and development, the scientific literature remains a major source of information valuable to pharmaceutical enterprises, and therefore tools for mining such data and integrating it with other sources are of vital interest and economic impact. This review provides a brief overview of approaches to literature mining as they relate to drug discovery, and offers an illustrative case study of a 'lightweight' approach we have implemented within an industrial context.


Systematic drug repositioning based on clinical side-effects.(Link)

PLoS One. 2011;6(12):e28025.
December 2011

Drug repositioning helps fully explore indications for marketed drugs and clinical candidates. Here we show that the clinical side-effects (SEs) provide a human phenotypic profile for the drug, and this profile can suggest additional disease indications. We extracted 3,175 SE-disease relationships by combining the SE-drug relationships from drug labels and the drug-disease relationships from PharmGKB. Many relationships provide explicit repositioning hypotheses, such as drugs causing hypoglycemia are potential candidates for diabetes. We built Naïve Bayes models to predict indications for 145 diseases using the SEs as features. The AUC was above 0.8 in 92% of these models. The method was extended to predict indications for clinical compounds, 36% of the models achieved AUC above 0.7. This suggests that closer attention should be paid to the SEs observed in trials not just to evaluate the harmful effects, but also to rationally explore the repositioning potential based on this "clinical phenotypic assay".


Human disease-drug network based on genomic expression profiles(Link)

PLoS One
August 2009

We performed a systematic, large-scale analysis of genomic expression profiles of human diseases and drugs to create a disease-drug network. A network of 170,027 significant interactions was extracted from the approximately 24.5 million comparisons between approximately 7,000 publicly available transcriptomic profiles. The network includes 645 disease-disease, 5,008 disease-drug, and 164,374 drug-drug relationships. At least 60% of the disease-disease pairs were in the same disease area as determined by the Medical Subject Headings (MeSH) disease classification tree. The remaining can drive a molecular level nosology by discovering relationships between seemingly unrelated diseases, such as a connection between bipolar disorder and hereditary spastic paraplegia, and a connection between actinic keratosis and cancer. Among the 5,008 disease-drug links, connections with negative scores suggest new indications for existing drugs, such as the use of some antimalaria drugs for Crohn's disease, and a variety of existing drugs for Huntington's disease; while the positive scoring connections can aid in drug side effect identification, such as tamoxifen's undesired carcinogenic property. From the approximately 37K drug-drug relationships, we discover relationships that aid in target and pathway deconvolution, such as 1) KCNMA1 as a potential molecular target of lobeline, and 2) both apoptotic DNA fragmentation and G2/M DNA damage checkpoint regulation as potential pathway targets of daunorubicin.
We have automatically generated thousands of disease and drug expression profiles using GEO datasets, and constructed a large scale disease-drug network for effective and efficient drug repositioning as well as drug target/pathway identification.


Electronic health records: Implications for drug discovery(Link)

Drug Discov Today. 2011 Jul;16(13-14):594-9
July 2011

Electronic health records (EHRs) have increased in popularity in many countries. Pushed by legal mandates, EHR systems have seen substantial progress recently, including increasing adoption of standards, improved medical vocabularies and enhancements in technical infrastructure for data sharing across healthcare providers. Although the progress is directly beneficial to patient care in a hospital or clinical setting, it can also aid drug discovery. In this article, we review three specific applications of EHRs in a drug discovery context: finding novel relationships between diseases, re-evaluating drug usage and discovering phenotype-genotype associations. We believe that in the near future EHR systems and related databases will impact significantly how we discover and develop safe and efficacious medicines.


A pathway-based view of human diseases and disease relationships(Link)

PLoS One
February 2009

It is increasingly evident that human diseases are not isolated from each other. Understanding how different diseases are related to each other based on the underlying biology could provide new insights into disease etiology, classification, and shared biological mechanisms. We have taken a computational approach to studying disease relationships through 1) systematic identification of disease associated genes by literature mining, 2) associating diseases to biological pathways where disease genes are enriched, and 3) linking diseases together based on shared pathways. We identified 4,195 candidate disease associated genes for 1028 diseases. On average, about 50% of disease associated genes of a disease are statistically mapped to pathways. We generated a disease network which consists of 591 diseases and 6,931 disease relationships. We examined properties of this network and provided examples of novel disease relationships which cannot be readily captured through simple literature search or gene overlap analysis. Our results could potentially provide insights into the design of novel, pathway-guided therapeutic interventions for diseases.


A global pathway crosstalk network(Link)

June 2008

We have developed a computational approach to detect crosstalk among pathways based on protein interactions between the pathway components. We built a global mammalian pathway crosstalk network that includes 580 pathways (covering 4753 genes) with 1815 edges between pathways. This crosstalk network follows a power-law distribution: P(k) approximately k(-)(gamma), gamma = 1.45, where P(k) is the number of pathways with k neighbors, thus pathway interactions may exhibit the same scale-free phenomenon that has been documented for protein interaction networks. We further used this network to understand colorectal cancer progression to metastasis based on transcriptomic data.


Inferring pathways from gene lists using a literature-derived network of biological relationships(Link)

March 2005

MOTIVATION: A number of omic technologies such as transcriptional profiling, proteomics, literature searches, genetic association, etc. help in the identification of sets of important genes. A subset of these genes may act in a coordinated manner, possibly because they are part of the same biological pathway. Interpreting such gene lists and relating them to pathways is a challenging task. Databases of biological relationships between thousands of mammalian genes can help in deciphering omics data. The relationships between genes can be assembled into a biological network with each protein as a node and each relationship as an edge between two proteins (or nodes). This network may then be searched for subnetworks consisting largely of interesting genes from the omics experiment. The subset of genes in the subnetwork along with the web of relationships between them helps to decipher the underlying pathways. Finding such subnetworks that maximally include all proteins from the query set but few others is the focus for this paper. RESULTS: We present a heuristic algorithm and a scoring function that work well both on simulated data and on data from known pathways. The scoring function is an extension of a previous study for a single biological experiment. We use a simple set of heuristics that provide a more efficient solution than the simulated annealing method. We find that our method works on reasonably complex curated networks containing approximately 9000 biological entities (genes and metabolites), and approximately 30,000 biological relationships. We also show that our method can pick up a pathway signal from a query list including a moderate number of genes unrelated to the pathway. In addition, we quantify the sensitivity and specificity of the technique.


Gene Vector Analysis (Geneva): a unified method to detect differentially-regulated gene sets and similar microarray experiments(Link)

BMC Bioinformatics
August 2008

Microarray experiments measure changes in the expression of thousands of genes. The resulting lists of genes with changes in expression are then searched for biologically related sets using several divergent methods such as the Fisher Exact Test (as used in multiple GO enrichment tools), Parametric Analysis of Gene Expression (PAGE), Gene Set Enrichment Analysis (GSEA), and the connectivity map.
We describe an analytical method (Geneva: Gene Vector Analysis) to relate genes to biological properties and to other similar experiments in a uniform way. This new method works on both gene sets and on gene lists/vectors as input queries, and can effectively query databases consisting of sets of biologically related sets, or of results from other microarray experiments. We also present an improvement to the null model estimate by using the empirical background distribution drawn from previous experiments. We validated Geneva by rediscovering a number of previous findings, and by finding significant relationships within microarrays in the GEO repository.
Provided a reasonable corpus of previous experiments is available, this method is more accurate than the class label permutation model, especially for data sets with limited number of replicates. Geneva is, moreover, computationally faster because the background distributions can be precomputed. We also provide a standard evaluation data set based on 5 pairs of related experiments that should share similar functional relationships and 28 pairs of unrelated experiments from GEO. Discovering relationships amongst GEO data sets has implications for drug repositioning, and understanding relationships between diseases and drugs.



View Pankaj's full profile to...

  • See who you know in common
  • Get introduced
  • Contact Pankaj directly

Not the Pankaj Agarwal you're looking for? View more


People Also Viewed

LinkedIn member directory:

  1. a
  2. b
  3. c
  4. d
  5. e
  6. f
  7. g
  8. h
  9. i
  10. j
  11. k
  12. l
  13. m
  14. n
  15. o
  16. p
  17. q
  18. r
  19. s
  20. t
  21. u
  22. v
  23. w
  24. x
  25. y
  26. z
  27. more