ProFAT - Extended Help

Overview

ProFAT is a tool for the functional annotation of protein sequences which includes information from sequence similarity, structural similarity and primary annotation.  This information is combined in order to obtain results with the maximal Biological Revelance to the user.

Keywords

The first step in a ProFAT search is the construction of a relevant keyword list for submission with the sequence.  The keyword list chosen should reflect what the user would find interesting in the annotation of similar proteins (for example, in the GenBank records).  ProFAT performs (if selected) basic modifications to the list to prevent the requirement for multiple entries of the keywords with different styles.  For example, with "preprocessing" selected, if the user enters "GTPase", ProFAT will also detect "gtpase" and "GTPASE".

Modules

ProFAT consists of two "Core" modules and two "Domain Prediction" modules.  Due to the modular nature of proteins, both sequence similiary and structural similarity searches are enhanced greatly via the submission of the proteins "modules", i.e. Domains.  For this reason ProFAT searches start with the domain prediction step based on RPS-BLAST to split the sequence into these modules.  If this search fails to find the required modules then "Domain Prediction" can be performed to attempt to predict the modular structure of the protein.

Annotation Engine

The Annotation Engine performs Annotation and Text mining on a PSI-BLAST sequence similarity search against the NCBI non-redundant database.  The generalised schema is below:

PSI-BLAST on protein Sequence
|
Annotation of Hits with GenBank Records
|
Obtaining PUBMED / MEDLINE Abstracts for Hits
|
Text Mining of Hits with Users Keyword List
|
Results

The Annotation Engine provides three different outputs described below:

Mined
RAW
Full
Contains only the PSI-BLAST hits that contain the user defined keywords in the annotation.
Provides the raw PSI-BLAST output without annotation.
Contains the PSI-BLAST results with Full Annotation.


Threading

Threading is a method of determining similarity between proteins based on structure.  ProFAT implements Threading via Threader 3.5 (Jones et al, 1999) and this is available for free use by academic users.  Threading is optimised by the use of various filters to remove regions of low complexity (SEG), Coiled Coils (Coils2) and to predict the secondary structure (PSI-PRED).  Primary Annotation with also obtained from the Threading results and this is subject to text mining.  If both Threading and the Annotation Engine are used on the same run then "combined results" are generated, where the user can see at a glance all the results containing a given keyword.

A Threading run generates three different outputs as described below:

Threader
Annotated
Mined
Provides the Output from Threader annotated with primary annotation from the PDB
Provides the Output from Threader annotated with primary annotation from GenBank and the PDB
Provides the Output from Threader where the user defined keywords have been detected


Combined Results

Combined Results are provided when both the Annotation Engine and Threading are run on the same sequence.  These results are ordered by Keyword and provide both the Annotated Threader output and the PSI-BLAST output on a single page.

Domain Prediction

Domain Prediction is invoked if there are no domains detected in the submitted sequence. ; Domain prediction aims to go deeper into the Domain Search results using the context specific information in the keyword list provided by the user.  The Schema of the domain Prediction is provided below:

RPS-BLAST on protein Sequence
|
Obtain Consensus sequence of Domains detected in Sequence
|
BLAST consensus sequence against the non-redundant database
|
Annotation of Hits with GenBank Records
|
Obtaining PUBMED / MEDLINE Abstracts for Hits
|
Text Mining of Hits with Users Keyword List
|
Scoring of domains via quantity of user selected keywords present
|
Results


Domains detected via this method may then be resubmitted to ProFAT for further analysis.

HMMerThread

HMMerThread is a method for the detection of protein domains via the combination of Sequence and structure based approaches.  HMMerThread can be used where there there is no idea as to the context of the protein in question and may predict a domain which can then be used to assign a context and therefore generate a keyword list for further analysis.  HMMerThread first performs a HMMer domain search on the protein sequence using the PFAM HMM domain database.  Domain hits are then mapped to PDB structures containing these functional domains via the PDBMAP database.  If these structures are also present in the Threading Database then they are displayed for the user to select.  Once a domain is selected it is processed as for Threading and is sent to the Threader.  If the Threader scores the sequence as structurally similar to domain detected via sequence similarity it is displayed in the results.


Valid HTML 4.01 Transitional


























End of page