eSLDB contains the annotation extracted from the SwissProt database using an automated tool that parses the SUBCELLULAR LOCALIZATION section of the COMMENT field. The words directly and/or implicitly referring to one of 17 classes are taken into account. Entries annotated as probable, possible or by similarity were not considered as experimental annotations.
The SwissProt annotation for subcellular localization of eukaryotes can be grouped into 17 major classes:
- Endoplasmic reticulum
- Cell wall
- Secretory pathway
Annotation by Homology
Proteins sharing high sequence identity usually have the same subcellular localization
Each sequence is alingned with the experimentally annotated proteins belonging to the same kingdom (metazoa, fungi or viridiplantae). The annotation of the best(s) scoring match having an E-value lower than 10-4, if existing, is then transferred to the query sequence.
A pipeline of predictors based on machine learning has been used to predict all the remaining proteins. We used Spep (Fariselli et al, 2003) and ENSEMBLE (Martelli et al, 2003) for discriminating membrane proteins, then BaCelLo (Pierleoni et al, 2006) for assigning a localization for soluble proteins. These are among the best available methods.
To achieve a good reliability, the 16 original classes are reduced to 6 macro-classes:
- Secretory pathway
The decision tree structure of the prediction is also reported, deriving from the original output of the used methods. A color code is used: the more the prediction color is intense the greater is its reliability
How to search the Database
- First of all, you have to choose the organism to search for.
- Choose a method to query the database:
- Search by name: enter one or multiple proteins codes separated by a “ ; ” character. Proteins codes are derived from the database providing the sequences (usually Ensembl)
- Search by annotation: choose any localization and the way it was determined. You can combine multiple selections with logical operators: OR and AND.
- Search by sequence: enter a protein sequence in RAW FORMAT to find it in the database
- Choose how many proteins to display in the result page(s).
- Choose “Display” to view the results or “Download” to save them locally
Fariselli, P., Finocchiaro, G. and Casadio, R. (2003) SPEPlip: the detection of signal peptide and lipoprotein cleavage sites. Bioinformatics, 19, 2498-2499.
Martelli, P.L., Fariselli, P. and Casadio, R. (2003) An ENSEMBLE machine learning approach for the prediction of all-alpha membrane proteins. Bioinformatics, 19, i205-i211.
Pierleoni, A., Martelli, P.L., Fariselli, P. and Casadio, R. (2006) BaCelLo: a Balanced subCellular Localization predictor. Bioinformatics, 22, e408-e416