vignettes/Introduction_to_classyfireR.Rmd
Introduction_to_classyfireR.Rmd
ClassyFire is a web-based application for automated structural classification of chemical compounds.
The classyfireR
R package provides access to the ClassyFire RESTful API for retrieving existing compound classifications and submitted structures to the web-server for classification.
classyfireR can be installed from CRAN or, for the latest development version, directly from GitHub using the remotes
package.
install.packages('classyfireR')
remotes::install_github('aberHRML/classyfireR')
To retrieve classifications that are already available simply provide an InChI key to the get_classification
function.
library(classyfireR)
#> Loading required package: magrittr
Classification <- get_classification('BRMWTNUJHUMWMS-LURJTMIESA-N')
#> ✔ BRMWTNUJHUMWMS-LURJTMIESA-N
Classification
#> ── ClassyFire Object ───────────────────────────────────── classyfireR v0.3.7 ──
#> Object Size: 18.2 Kb
#>
#> Info:
#> • InChIKey=BRMWTNUJHUMWMS-LURJTMIESA-N
#>
#> • [H][C@](N)(CC1=CN(C)C=N1)C(O)=O
#>
#> • Classification Version: 2.1
#>
#> kingdom : Organic compounds
#> └─superclass : Organic acids and derivatives
#> └─class : Carboxylic acids and derivatives
#> └─subclass : Amino acids, peptides, and analogues
#> └─level 5 : Amino acids and derivatives
#> └─level 6 : Alpha amino acids and derivatives
#> └─level 7 : Histidine and derivatives
The result of each classification is stored in a single S4 (ClassyFire) object. To retrieve multiple classification, simply iterate over a vector of InChI Keys’
For classification submission using SMILES, this can be performed by supplying multiple SMILES to the submit_query
function. The results from all of the inputs, will be returned to a single S4 Query
class.
If any of the inputs are not successfully classified, then these will be stored in the unclassified
slot and can be accessed using the unclassified
accessor method.
Input <- c(MOL1 = 'CCCOCC', MOL2 = 'CNC(CC1=CN=CN1)C(=O)O', MOL3 = 'CXN')
Query <-
submit_query(label = 'query_test',
input = Input,
type = 'STRUCTURE')
Query
#> ── ClassyFire Query Object ─────────────────────────────── classyfireR v0.3.7 ──
#> Object Size: 17.1 Kb
#>
#> 1 structures classified
#> • MOL1 : InChIKey=NVJUHMXYKCUMQA-UHFFFAOYSA-N
#>
#> 2 structures not classified
#> • MOL2 : CNC(CC1=CN=CN1)C(=O)O
#> • MOL3 : CXN
unclassified(Query)
#> MOL2 MOL3
#> "CNC(CC1=CN=CN1)C(=O)O" "CXN"
There are a series of accessor methods which will work with either object type to return results from a specific slot in the object.
classification(Classification)
#> # A tibble: 7 x 3
#> Level Classification CHEMONT
#> <chr> <chr> <chr>
#> 1 kingdom Organic compounds CHEMONTID:0000000
#> 2 superclass Organic acids and derivatives CHEMONTID:0000264
#> 3 class Carboxylic acids and derivatives CHEMONTID:0000265
#> 4 subclass Amino acids, peptides, and analogues CHEMONTID:0000013
#> 5 level 5 Amino acids and derivatives CHEMONTID:0000347
#> 6 level 6 Alpha amino acids and derivatives CHEMONTID:0000060
#> 7 level 7 Histidine and derivatives CHEMONTID:0004311
classification(Query)
#> # A tibble: 5 x 4
#> # Groups: inchikey [1]
#> identifier inchikey Level Classification
#> <chr> <chr> <chr> <chr>
#> 1 MOL1 InChIKey=NVJUHMXYKCUMQA-UHFFFAO… kingdom Organic compounds
#> 2 MOL1 InChIKey=NVJUHMXYKCUMQA-UHFFFAO… superclass Organic oxygen compou…
#> 3 MOL1 InChIKey=NVJUHMXYKCUMQA-UHFFFAO… class Organooxygen compounds
#> 4 MOL1 InChIKey=NVJUHMXYKCUMQA-UHFFFAO… subclass Ethers
#> 5 MOL1 InChIKey=NVJUHMXYKCUMQA-UHFFFAO… direct_par… Dialkyl ethers
meta(Classification)
#> $inchikey
#> [1] "InChIKey=BRMWTNUJHUMWMS-LURJTMIESA-N"
#>
#> $smiles
#> [1] "[H][C@](N)(CC1=CN(C)C=N1)C(O)=O"
#>
#> $version
#> [1] "2.1"
meta(Query)
#> # A tibble: 1 x 4
#> # Groups: inchikey [1]
#> identifier inchikey smiles classification_version
#> <chr> <chr> <chr> <chr>
#> 1 MOL1 InChIKey=NVJUHMXYKCUMQA-UHFFFAOYSA-N CCCOCC 2.1
chebi(Classification)
#> [1] "L-alpha-amino acid (CHEBI:15705)"
#> [2] "imidazolyl carboxylic acid (CHEBI:38307)"
#> [3] "aralkylamine (CHEBI:18000)"
#> [4] "imidazoles (CHEBI:24780)"
#> [5] "organic aromatic compound (CHEBI:33659)"
#> [6] "amino acid (CHEBI:33709)"
#> [7] "carbonyl compound (CHEBI:36586)"
#> [8] "carboxylic acid (CHEBI:33575)"
#> [9] "carboxylic acid anion (CHEBI:29067)"
#> [10] "organonitrogen heterocyclic compound (CHEBI:38101)"
#> [11] "pnictogen molecular entity (CHEBI:33302)"
#> [12] "organic molecular entity (CHEBI:50860)"
#> [13] "organic oxide (CHEBI:25701)"
#> [14] "alkylamine (CHEBI:13759)"
#> [15] "organic molecule (CHEBI:72695)"
#> [16] "histidine derivative (CHEBI:24599)"
#> [17] "chemical entity (CHEBI:24431)"
#> [18] "organooxygen compound (CHEBI:36963)"
#> [19] "peptide (CHEBI:16670)"
#> [20] "organonitrogen compound (CHEBI:35352)"
#> [21] "alpha-amino acid (CHEBI:33704)"
#> [22] "organic heterocyclic compound (CHEBI:24532)"
#> [23] "azole (CHEBI:68452)"
#> [24] "nitrogen molecular entity (CHEBI:51143)"
#> [25] "amine (CHEBI:32952)"
#> [26] "oxygen molecular entity (CHEBI:25806)"
#> [27] "primary amine (CHEBI:32877)"
chebi(Query)
#> $MOL1
#> [1] "organic molecule (CHEBI:72695)"
#> [2] "ether (CHEBI:25698)"
#> [3] "chemical entity (CHEBI:24431)"
#> [4] "oxygen molecular entity (CHEBI:25806)"
#> [5] "organic molecular entity (CHEBI:50860)"
#> [6] "organooxygen compound (CHEBI:36963)"
If you use classyfireR
you should cite the ClassyFire publication
Djoumbou Feunang Y, Eisner R, Knox C, Chepelev L, Hastings J, Owen G, Fahy E, Steinbeck C, Subramanian S, Bolton E, Greiner R, and Wishart DS. ClassyFire: Automated Chemical Classification With A Comprehensive, Computable Taxonomy. Journal of Cheminformatics, 2016, 8:61.