Otwinowski and Nemenman, 2013

From Ilya Nemenman: Theoretical Biophysics @ Emory
Jump to: navigation, search

Back to the full Publications list.

J Otwinowski and I Nemenman. Genotype to phenotype mapping and the fitness landscape of the E. coli lac promoter. PLOS ONE 8, e61570, 2013. PDF, arXiv.

Abstract
Fitness landscapes and epistatic interactions are difficult to measure because of their high combinatorial complexity. Here we infer a large fitness landscape from high-throughput sequence data from the E. coli lac promoter region with ~200,000 mutanegized sequences of 75 nucleotides. The sequences are associated with measurements of transcriptional activity which we take as a proxy for fitness. Utilizing regression and L1 regularization, we infer the best non-epistatic and epistatic approximations of the genotype-phenotype map. Only non-averaged epistasis is considered. We find that the additive (non-epistatic) components account for about 2/3 of the explainable variance in the data, while the epistatic components explain on the order of 10%. We find the fitness landscape to be essentially single peaked, with a small amount of antagonistic epistasis. By comparison to neutrally evolved randomly generated sequences, we deduce a significant amount of selective pressure on the wild type. Our method also reveals the binding sites and their interactions, without any difficult optimization steps. We also infer the landscapes for two environments corresponding to pure lactose metabolism, and to reduced lactose metabolism in the presence of glucose. Sequences close to the wild type, and the wild type itself, were found to be nearly optimal in the multi-objective sense. We conclude with a cautionary note that inferred properties of fitness landscapes may be severely influenced by biases in the training data.