ChemMapper Server is a web-based open resource of chemical database searching via molecular 3D similarity calculation strategies with the help of SHAFTS, an in-house method combining the strength of molecular shape superposition and chemical feature matching. ChemMapper provides suggestions or hints for the drug discovery studies like:
Identifying the potential drug targets or exploring polypharmacology effects for the given bioactive compounds.
Exploring potential mechanisms for drug's side effects.
Discovering similar compounds to the given compound from the bioactive or screening database.
Performing scaffold hopping for the given bioactive compounds.
Performing superposition of your own interested compounds on the given query.
ChemMapper technically provides two types of chemical database searching services.
Annotated active compounds profiling and corresponding binding proteins prediction for the given small chemical probes (drugs, natural products, and other bioactive or toxic compounds). ChemMapper is expected to deliver helpful information and visualized results for potential drug targets identification, polypharmacology analysis like profiling drug targets network and adverse effects prediction, and old drug repurposing. Target Navigator curates nearly 300 000 drug-like molecules from ChEMBL, DrugBank, BindingDB with appropriate pharmacology annotations of the protein targets, including protein names, species, UniProt links for detail information, biological functions, reactions, pharmacological actions, and bioactivities collected from HTS and journal published papers. Target Navigator also collects the single bioactive conformations from the PDB database, and the structural information for the enzyme chemical substrates from the KEGG database.
Hit Explorer provides virtual screening services to perform similar chemicals searching, active compounds scaffold hopping, and 3D structures superposition against several commercial, open accessed, or even user-uploaded databases, like Specs, MayBridge, ZINC Leadlike Set, and NCI Open Database Compounds.
To start with ChemMapper Server, you must provide a single query molecule structure (by sketching with JME applet, pasting a smiles string, or uploading a file in SMILES, SDF, or MOL2 format). Given the searching service selected (Target Navigator or Hit Explorer), ChemMapper will automatically generate a valid 3D conformation for the query in the case of lacking 3D coordinate’s information, calculate the 3D similarities between the query and each molecule in the target database, and then output the top ranked hit molecules’ structures, predicted 3D conformers as well as corresponding proteins annotations and bioactivity information (if any) in the result pages.
ChemMapper consists of five components. a). a chemical database contains about 300 000 000 drug like molecular structures; b). in-house 3D-similarity calculation method SHAFTS; c). a compound-target annotation database; d). CPI network inference method for target recommendation and e). a display tool for displaying results.
An outline of ChemMapper’s general design is shown as the figure below.
The random walk on graphs is defined as an iterative walker’s transition from its current node to a randomly selected neighbor. Formally, the random walk algorithm is defined as:
Rs+1=αPTRs+(1-α)R0 (1)
where P is bipartite network probability transition matrix and Rs is a vector in which i -node holds the resource at the time s step. The coefficient α present probability whether the random walker restart walk at the time s step.
In ChemMapper Target Navigator mode, we use random walk algorithm to calculate the probability of the path from query molecule to potential targets.
When SHAFTS program finishes its work and generated result hits, a CPI bipartite network between hit compounds and their targets will be constructed. The initial resource vector R0=[RCompound0 RProtein0]T, where RCompound0 was constructed by probabilities according to the similarity score calculated by SHAFTS while RProtein0 was assigned with equal probabilities.
Next we define an adjacency matrix A which present CPI bipartite network according to whether activity data is available. When users choose DrugBank, KEGG or PDB for screen, Aij=1 if Ci and Pj is linked; otherwise Aij=0. On the other hand, if users choose ChEMBL or BindingDB as screen database, the activity between compounds and proteins was also considered. In this case, Aij=-log10(Ki(OR IC50)/100μM) if the activity between Ci and Pj is less than 100μM; otherwise Aij=0.
Then given the adjacency matrix A, the probability transition matrix P could be defined as:
where Ap(i,j)= A(i,j)/sum(A(i,:)) and Bp(i,j)= A(j,i)/(A(:,j)).
Finally, integration of Eq. (1) yields:
Here we use α=0.8. The algorithm will iterate until the change between Rs+1 and Rs fell below 10-6. The predicted targets are ranked according to the values in the steady-state probability vector RProtein+∞.
The final result will be normalized by standard score.
The standard score (z-score) was defined as:
Where μ is the mean of the all the relevancy score of all protein terms and σ is the standard deviation of the relevancy score of vector RProtein+∞.