Predicting Transcriptome of Escherichia coli from “Marker” Genes

View all posters

Maurice Ling, Chueh Loo Poh

Nanyang Technological University, Singapore

One of the fundamental hurdles in genetic engineering is to gauge the effects of transgenes to the native system. High-throughput transcriptome profiling, such as microarrays and RNA-seq, can only profile the effects of transgenes after the cloning process is completed. However, it will be handy to estimate the effects of transgene prior to cloning. Many studies had demonstrated that co-expressed genes are biologically significant and many tools had been developed to deduce pathways from co-expression data. As first instance, we present a correlation-based model for predicting the native transcriptome using expression values from 59 genes as markers. This can be the basis for a model to estimate the effects of transgenes on the native system. The model was developed using a set of 605 microarrays across 40 different experiments. A linear regression model is calculated for each pair of genes to predict the expression of adjacent genes. Our analysis using pairs of probes detecting the same transcript suggests 19.2% intra-array variation. We developed a single-pass transcriptome predictor to predict expression values of the entire transcriptome using linear regression models. Evaluation on predictability was performed using microarrays that were not used for network development. Our results shows that gene-pairs intervened by 3 co-expressed genes (4 jumps in total) can be reliably predicted within 3 standard deviations using at least 30 different paths between the marker genes and the target genes to be predicted. Using a random set of 30 microarrays, our results demonstrated that the average error of predicted expression values to be within 3 standard deviations (29 out of 30 transcriptomes predicted; 96.7%) or within 40% (23 out of 30 transcriptomes predicted; 76.7%), inclusive of the intra-array variation. These results are promising and represent a first-draft of transcriptome prediction using a relatively small set of marker genes.