Introduction

Protein structure prediction from a sequence is one of the high focus problems for researchers. This is a very useful application of bioinformatics as the experimental techniques like x-ray crystallography are time consuming. The fundamental issue is how can we predict the 3-D shape of a protein from its amino acid sequence. Now we will see how to predict the protein structure and function based on the amino acid sequence.

The protein folding problem

According to the Alfinsen’s hypothesis, the 3-D structure of a protein is determined solely by the amino-acid sequence information. The strong argument against the Alfinsen hypothesis is the Levinthal’s paradox. There have been some thoughts on the resolution of Levinthal’s paradox. These are summarized below:

1. The theoretical methods used to prove hardness are not what nature is trying to optimize.

2. Evolution may have selected proteins which fold easily.

3. Proteins may well fold in locally, not globally optimal ways.

To summarize, it is difficult to predict protein structure from sequence. However from the growing database of experimentally determined protein structures, some heuristics are emerging:

1. The number of unique protein folds is quite limited.

2. There are many proteins with the same fold, but no similarity of sequence.

3. ‘Neutral’ mutations altering the protein structure are likely.

Protein identification and characterization

Some of the tools that help in predicting the physical properties of known proteins are:

1. AA CompIdent.

2. TagIdent, PeptIdent and MultiIdent.

3. PROPSEARCH.

4. PepSea.

5. PepMapper, Mascot and PeptideSearch.

6. FindPept.

1. AA CompIdent:

This is used to identify protein by its amino-acid composition. It uses amino-acid composition of an unknown protein to identify known proteins of the same composition.

2. TagIdent, PeptIdent and MultiIdent:

TagIdent allows the generation of a list of proteins close to a given pI and Mw.

PeptIdent is used to identify proteins with peptide mass fingerprinting data, pI andMw.

MultiIdent is a tool that allows the identification of proteins using pI, Mw, amino-acid composition, sequence tag and peptide mass fingerprinting data.

3. PROPSEARCH:

This is a tool to find putative protein family. It uses amino-acid composition as the input. In addition other properties like molecular weight, content of bulky residues, content of small residues, average hydrophobicity, average charge and content of selected dipeptide groups are calculated from the sequence as well.

4. PepSea:

It is a tool for protein identification by peptide mapping or peptide sequencing.

5. PepMapper, Mascot and PeptideSearch:

PepMapper takes peptide mass as the key input.

Mascot search takes peptide mass fingerprint, sequence query or MS/MS ion search as input.

PeptideSearch uses list of peptide masses, peptide sequence tag, amino-acid sequence as input.

6. FindPept:

It is used to identify peptides that results from unspecific cleavage of proteins. This takes into account artifactual chemical modifications, post-translational modifications and protease autolytic cleavage.

Conclusion:

These are some of the protein structure and function prediction tools. This leads to the next step which is primary structure analysis and prediction.

By admin

Leave a Reply

Your email address will not be published. Required fields are marked *