Download e-book Protein-protein Interactions and Networks: Identification, Computer Analysis, and Prediction

Free download. Book file PDF easily for everyone and every device. You can download and read online Protein-protein Interactions and Networks: Identification, Computer Analysis, and Prediction file PDF Book only if you are registered here. And also you can download or read online all Book PDF file that related with Protein-protein Interactions and Networks: Identification, Computer Analysis, and Prediction book. Happy reading Protein-protein Interactions and Networks: Identification, Computer Analysis, and Prediction Bookeveryone. Download file Free Book PDF Protein-protein Interactions and Networks: Identification, Computer Analysis, and Prediction at Complete PDF Library. This Book have some digital formats such us :paperbook, ebook, kindle, epub, fb2 and another formats. Here is The CompletePDF Book Library. It's free to register here to get Book file PDF Protein-protein Interactions and Networks: Identification, Computer Analysis, and Prediction Pocket Guide.
Protein-protein Interactions and Networks: Identification, Computer Analysis, and Prediction (Computational Biology) [Anna Panchenko, Teresa M. Przytycka] on.
Table of contents

In our development, we have found that the residual mechanism is able to drastically simplify the training process, and largely decreases the epochs of parameter updates for the model to converge. A convolution layer serves as the first encoding layer to extract local features from the input sequence. On top of that, a residual GRU layer takes in the preserved local features, whose outputs are passed to another convolution layer.

Repeating of these two components in the network structure conducts an automatic multi-granular feature aggregation process on the protein sequence, while preserving the sequential and contextualized information on each granularity of the selected features. Note that the dimensionality of the last hidden states does not need to equal that of the previous hidden states. A high-level sequence embedding of the entire protein sequence is obtained from the global average-pooling Lin et al.

Each embedding vector is a concatenation of two sub-embeddings, i. The second part a ph represents the similarity of electrostaticity and hydrophobicity among amino acids. The 20 amino acids can be clustered into 7 classes based on their dipoles and volumes of the side chains to reflect this property. Thus, a ph is a one-hot encoding based on the classification defined by Shen et al. Both sequence embeddings are combined using element-wise multiplication, i. This is a commonly used operation to infer the relation of sequence embeddings Hashemifar et al.

Note that some works use the concatenation of sequence embeddings Sun et al. The entire learning architecture is trained to optimize the following two types of losses according to different PPI prediction problems. Cross-entropy loss is optimized for the two classification problems, i. The learning objective is to minimize the following cross-entropy loss, where c p is a one-hot indicator for the class label of protein pair p. Mean squared loss is optimized for the binding affinity estimation task. We present the experimental evaluation of the proposed framework on three PPI prediction tasks, i.

The experiments are conducted on the following datasets. Guo et al. Each dataset contains a balanced number of positive and negative samples. Among these resources, the Yeast dataset is a widely used benchmark by most state-of-the-art methods Hashemifar et al. We use the full protein sequences in our model, which are obtained from the UniProt Consortium et al.

The negative cases are generated by randomly pairing the proteins without evidence of interaction, and filtered by their sub-cellular locations. In other words, non-interactive pairs residing in the same location are excluded. In addition, we combine the data for Caenorhabditis elegans , Escherichia coli and Drosophila melanogaster as the multi-species dataset.

There are seven types of interactions: activation, binding, catalysis, expression, inhibition, post-translational modification ptmod and reaction.


We download all interaction pairs for Homo sapiens from database version In this process, we randomly sample instances of different interaction types to ensure a balanced class distribution. We use these two datasets for the PPI type prediction task. It contains binding affinity changes upon mutation of protein sub-units within a protein complex. The binding affinity is measured by equilibrium dissociation constant K d , reflecting the strength of biomolecular interactions.

The smaller K d value means the higher binding affinity. Each protein complex contains single or multiple amino acid substitutions. The sequence of the protein complex is retrieved from the protein data bank PDB Berman et al.

We manually replace the mutated amino acids. For duplicate entries, we take the average K d. The final dataset results in the binding affinity of mutant protein complexes, along with wild-types. Binary PPI prediction is the primary task targeted by a handful of previous works Hashemifar et al. The objective of these works is to identify whether a given pair of proteins interacts or not based on their sequences. The number of occurrences for the RCNN units i. We set the hidden state size to be 50, and the RCNN output size to be We set this configuration to ensure the RCNN to compress the selected features in a reasonably small vector sequence, before the features are aggregated by the last global average-pooling.

We zero-pad short sequences to the longest sequence length in the dataset. This is a widely adopted technique for sequence modeling in NLP Chen et al.

eLife digest

Note that the configuration of embedding pre-training is discussed in Section 4. All model variants are trained until converge at each fold of the cross-validation CV.

  • Predicting Protein–Protein Interactions from the Molecular to the Proteome Level | Chemical Reviews.
  • Network-based prediction of protein interactions | Nature Communications.
  • Chicken Soup for the Womans Soul: 101 Stories to Open the Hearts and Rekindle the Spirits of Women (Chicken Soup for the Soul).
  • Change Password!

Evaluation protocol. Following the settings in previous works Hashemifar et al.

see url

Network-based prediction of protein interactions

Under the k -fold CV setting, the data are equally divided into k non-overlapping subsets, and each subset has a chance to train and to test the model so as to ensure an unbiased evaluation. We aggregate fix metrics on the test cases of each fold, i. All these metrics are preferred to be higher to indicate better performance. The P -values are adjusted by the Benjamini—Hochberg procedure Benjamini and Hochberg, to control the false discovery rate for multiple hypothesis testing.

We implement these two models following the descriptions in their papers. However, these two implementations can only achieve This shows the superiority of deep-learning-based techniques in encapsulating various types of information of a protein pair, such as amino acid composition and their co-occurrences, and automatically extracting the robust ones for the learning objectives. That said, DPPI requires an extensive effort in data pre-processing, specifically in constructing the protein profile for each sequence.

These two frameworks show that CNN can already leverage the significant features from primary protein sequences. Evaluation of binary PPI prediction on the Yeast dataset based on 5-fold cross-validation. We report the mean and SD for the test sets. This indicates that preserving the sequential and contextualized features of the protein sequences is as crucial as incorporating the local features. Next, we evaluate whether the improved accuracy of PIPR is statistically significant.


We also report the 5-fold CV performance of PIPR on variants of the multi-species dataset, where proteins are excluded based on different thresholds of sequence identity. Evaluation of binary PPI prediction on variants of multi-species C. The objective of this task is to predict the interaction type of two interacting proteins. To the best of our knowledge, much fewer efforts attempt for the multi-class PPI prediction in contrast to the binary prediction. Zhu et al. However, none of their implementations is publicly available.

  • Submission history.
  • Computational identification of protein-protein interactions in model plant proteomes.
  • Creative interchange.
  • Donate to arXiv;
  • Earth and life processes discovered from subseafloor environments : a decade of science achieved by the Integrated Ocean Drilling Program (IODP).

Different from the categories of interaction types used above, we aim at predicting the interaction types annotated by the STRING database. We train several statistical learning algorithms on the widely employed AC and CTD features for protein characterization as our baselines. R algorithm Zhu et al. All approaches are evaluated on the two datasets by fold CV, using the same partition scheme for a more unbiased evaluation James et al. We carry forward the model configurations from the last experiment to evaluate the performance of the frameworks under controlled variables.

For baseline models, we examine three different ways of combining the feature vectors of the two input proteins, i. The Manhattan difference consistently obtains better performance, considering the small values of the input features and the asymmetry of the captured protein relations. Among all the baselines using explicit features, the CTD-based models perform better than the AC-based ones. CTD descriptors seek to cover both continuous and discontinuous interaction information Yang et al. The best baseline using Random Forest thereof achieves satisfactory results by more than doubling the accuracy of zero rule on the smaller SHS27k dataset.

However, on the larger SHSk dataset, the accuracy of these explicit-feature-based models is notably impaired. We hypothesize that such predefined explicit features are not representative enough to distinguish the PPI types. On the other hand, the deep-learning-based approaches do not need to explicitly utilize these features, and perform consistently well in both settings.

The raw sequence information is sufficient for these approaches to drastically outperform the Random Forest by at least 5. This implies that the local interacting features are relatively more deterministic than contextualized and sequential features on this task.

Introduction to Biological Network Analysis II: Protein-Protein Interaction Networks: From Graphs to