ICST: Immune-based Classification of Solid Tumours


ICST is an open-access, open source software which allows you to classify solid tumours samples into one of the six immune subgroups
You have to register before using ICST Register here to receive a free token: Register
Download test data to try classifier: Test CSV file

Unclassifiable samples are those for which a confident subgroup call could not be made
Download table as .csv



C7 based samples are those for which two subgroup calls could be made (i.e., a tumour sample is placed between two immune subgroups)
Download table as .csv



Unclassifiable and Gene QC failed samples are not shown in this plot. Boxes show the confidence interval for subgroup assignment generated by bootstrapping, and the individual data points represent the final probability associated with each subgroup call.
Download plot as high-res .png




ICST version 2.2.1-3


The ICST was developed at Stratified Medicine Group (SMG) in Queen’s University Belfast by Reza Rafiee . This software which was implemented using machine learning techniques and Shiny web package in R is now available under an open source license via GitHub . The software is currently being developed and maintained by Reza Rafiee and a bioinformatics team in Almac Diagnostics .



Overview

ICST will classify tumour gene expression data into one of the six immune subtypes/subgroups (C1, C2, C3, C4, C5 and C6) spanning multiple tumour types (only solid tumours), with potential therapeutic and prognostic implications for cancer management.

In summary the classifier works as described below:

  1. We use 440 immune genes originated from The Immune Landscape of Cancer paper , the readout is performed by any gene expression platforms including Microarray, RNA-seq, qPCR, etc.
  2. FPKM normalised gene expression values, corresponding to the 440 genes for each sample, as a comma separated .csv file, are submitted to ICST.
  3. The number of genes successfully reporting gene expression values out of the 440 is assessed for each sample, imputation is used to impute any missing values using multiple imputation (MI) modelling utilising a Bootstrap Expectation Maximisation (BEM) algorithm implemented in the Amelia package. We can efficiently impute missing values of up to ? missing genes, if a sample has more missing values it is said to have failed Gene QC and is not classified.
  4. A multi-class optimised Support Vector Machine (SVM) validated and trained on the established gene expression based cohort is used to robustly assign a subgroup to samples by their 440 gene expression values.
    • Our SVM is validated using a bootstrapping technique via 1,000 random iterations of 80% of the training set, confidence interval derived from this is plotted on the Classification Graph as a box plot.
    • The final probability assignment for a subgroup call is made by creating an SVM model with the whole training set (n=2009 tumour samples); these probabilities are given in the Classification Table in the initial tab.
    • Calls made with a probability below a random guess threshold (0.50) are considered unreliable and samples will be labeled as Unclassifiable in the Classification Table, these samples will not be plotted in the Classification Graph.
    • Calls made with a probability between two immune subgroups where the sum of two probability assignments are greater than our predefined threshold are considered Predominant Subgroups and samples will be labeled as C7 in the Classification Table.
  5. Various post processing and formatting operations on the data take place with the interactive website being implemented in the R Shiny reactive web application framework.

For a typical dataset with 40 samples this whole computational procedure will take around 55 seconds - total classification time is given below the Classification Graph.

For more detailed explanation of our classifier including various optimisation and validation exercise see our manuscript and corresponding supplementary information (manuscript in preparation).


Reference

A manuscript is in preparation.


Download

The R code for this Shiny based website including training and validation cohorts can be downloaded from GitHub the website can also be run locally using Rstudio instructions and dependancies are outlined on GitHub.


Funding

ICST development was funded by Invest Northern Ireland (INI) program grant.

How to use our Classifier

ICST will classify solid tumours data into one of the six immune subgroups. To use the classifier follow the steps outlined below:

  1. A Comma separated value (.csv) file produced by any gene expression platform (RNA-seq, Micorarray, qPCR, etc.) is needed as input to use the classifier. If you would like to test drive the classifier, or would like to see how it should be formatted, a test file can be downloaded using the link in the grey box on the left.
  2. A gene expression (currently FPKM normalised and Log2) .csv file (up to 600 samples) can then be uploaded by clicking on the 'Chose File' or 'Browse...' (browser dependent) button on the left, once uploaded the classification happens automatically.
  3. By default the Classification Table output is preselected and will present you with a six immune subgroup classification for each of your samples. Other tabs presenting other information can then be accessed by clicking their names present at the top of the main panel.
  4. The contents of Tables can be downloaded by clicking the grey download button, these .csv files can then be loaded into Excel or other spreadsheet software if required.
  5. The Classification Plot can also be downloaded as a .png by clicking on the grey Download button at the bottom of the Classification Plot tab.


Input file format

The input file used for this version of ICST is exported from any gene expression platforms. The row and column of this file must be gene names and sample names, correspondingly, as we provide an example file in test_samples.csv

Genes corresponding to the five immune signatures we use were selected from The Immune Landscape of Cancer
ACTL6A ADAM9 ADAMTS1 ADCY7 AIMP2 ALKBH7 ALOX5AP AMPD3 APITD1 APOC1 APOE APOO ARHGAP1 ARHGAP15 ...


Suppport

If you have any issues with using ICST please contact Reza Rafiee.


SMG's ICST software is a research tool under development. Classification is not verified and has not been clinically validated. It has the potential to define which patients are more likely to respond to immune checkpoint therapies.