next up previous
Next: Predicting the Cleavage Sites Up: SPEP analysis Previous: SignalNet and CleavageNet performance

Combining SignalNet with CleavageNet

When a large scale analysis is required, as in the case of genome (proteome) annotation, more than the precise location of the cleavage site (which is of course still relevant) the prediction of the presence or absence of this sorting signal would be extremely useful, both for classification of the protein location and for a further detailed protein sequence processing. In this respect we define a filtering procedure which utilizes the notion that in order to assign a signal peptide we must have strong predictions for a great number of adjacent residues in the first part of the protein sequence. We then introduce the function
\begin{displaymath}
A_S(i)=\frac{1}{2D+1}\sum_{j=-D}^{D}O_S(i+j) \; .
\end{displaymath} (6)

which computes the average value of the network outputs in the symmetric neighborhood $[-D+i,i+D]$, and the function
\begin{displaymath}
C_S(i)=\frac{1}{i}\sum_{j=1}^{i}O_S(j) \; .
\end{displaymath} (7)

that computes the cumulative sum of the $i^{th}$ residues starting from the first position. In order to predict a signal peptide we then require that the position of the maximal score
\begin{displaymath}
P_S= \max_{i \in [1,65]} \{ A_S(i) \geq S\quad AND\quad C_S(i) \geq L\} \; .
\end{displaymath} (8)

must be in a given interval $[P_{min},P_{max}]$. The values of $D$, $P_{min}$, $P_{max}$, $S$ and $L$ are determined using the training sets. Adopting this new criterion we obtain a significant improvement over the original SignalNet performance, as it is shown in Table 5 where we report the results obtained for the different sets. This is particularly evident considering the correlation coefficient values (compare with Table 3).
Table 5: Signal peptide detection accuracy

Measure
Gram Gram Eukaryotes All
  positive negative    

$Q_2$
0.97 0.97 0.95 0.95
$C$ 0.93 0.93 0.90 0.91


next up previous
Next: Predicting the Cleavage Sites Up: SPEP analysis Previous: SignalNet and CleavageNet performance
2003-06-12