Classifications of protein roles in the functional organization of the cell

Abstract

The availability of functional genomics data sets for numerous organisms provides an opportunity to comprehensively analyze the roles proteins play in the functional organization of the cell.

In the first part of this thesis, we study how simple network features of hub proteins (i.e., those with many physical interactions) are predictive of their roles in the functional organization of the cell. We begin by examining an influential but controversial characterization of the dynamic modularity of the S. cerevisiae interactome that incorporated gene expression data into network analysis. We analyze the protein-protein interaction networks of five organisms—S. cerevisiae, H. sapiens, D. melanogaster, A. thaliana, and E. coli—and confirm significant and consistent functional and structural differences between hub proteins that are co-expressed with their interacting partners and those that are not, and support the view that the former tend to be intramodular within networks whereas the latter tend to be intermodular. However, we also demonstrate that in each of these organisms, simple topological measures are significantly correlated with the average co-expression of a hub with its partners and therefore also reflect protein intra- and inter-modularity. Further, cross-interactomic analysis demonstrates that these simple topological characteristics of hub proteins tend to be conserved across organisms. Overall, we give evidence that purely topological features of static interaction networks reflect aspects of the dynamics and modularity of interactomes as well as previous measures incorporating expression data, and are a powerful means for understanding the dynamic roles of hubs in interactomes.

In the second part of this thesis, we study the role of multifunctional genes (and the proteins they encode) in the functional organization of the cell. Many genes can play a role in multiple biological processes or molecular functions. Identifying multifunctional genes at a genome-wide level and studying their properties can shed light on the complexity of the molecular events that underpin cellular function, leading to a better understanding of the functional landscape of the cell. However, to date, genome-wide analysis of multifunctional genes has been limited. Here we introduce a computational approach that uses known functional annotations to extract genes playing a role in at least two distinct biological processes, and compare them with the remaining annotated genes. We leverage functional genomics data sets for three organisms—H. sapiens, D. melanogaster, and S. cerevisiae—and show that, as compared to other genes, genes involved in multiple biological processes possess distinct physicochemical properties, are more broadly expressed, tend to be intermodular in protein interaction networks, tend to be more evolutionarily conserved and are more likely to be essential. We also find that multifunctional genes are significantly more likely to be involved in human disorders. These same features also hold for genes with multiple molecular functions. Our analysis is a step towards a better genome-wide understanding of gene multifunctionality.

Overall, the results presented in this thesis lead to a better understanding of the complex functional roles that proteins play within the cell.

Publication
PhD Thesis
Yuri Pritykin
Yuri Pritykin
Assistant Professor of Computer Science and Genomics