The Sequence Data Base

Recently, a reliable data set comprising experimentally annotated proteins which contain (and do not contain) signal peptides has become available [7]. The authors exploited the information present in the SWISSPROT data base to derive this cleaned set [7]. In Table 1 we report the number of sequences for each organism type present in that data base. We also highlight the subsets of the proteins containing the signal peptides (Positive Set) and the complementary set (Negative Set).
Table 1: Number of sequences in the data base

Set Type
Eukaryotes Gram Gram All
    negatives positive  

Positive Set
1158 301 132 1591
Negatve Set 1142 297 129 1568