next up previous
Next: SPEP analysis Up: SPEP: a Signal Peptide Previous: The Sequence Data Base

Neural Network System

In this paper we use standard feed-forward neural networks with the back-propagation learning algorithm [2]. A thorough search into the parameter space defined six different neural network architectures (which are reported below). All the evaluations were carried out using a cross-validation procedure, taking care of eliminating detectable sequence identities among the corresponding learning and testing sets. This was done using $N\times N$ BLAST searches [1] and applying to the obtained sets the transitive closure algorithm to identify sequence clusters. We implemented two different types of neural networks, one for the prediction of the signal peptide SignalNet, which associates to each residue the probability of belonging to the signal peptide or not, and CleavageNet which predicts the position of the cleavage site. Both types have one output neuron, but they differ in the number of hidden and input units. In particular, we allowed asymmetric sliding window, considering that there is more information in the left part than on the right one. The input layers account for the sliding windows and the residue encoding. Each residue is coded by a 21 binary input vector. The first 20 positions of the vector represent the residue types, while the 21-th element codes for the empty positions (it occurs when the sliding window is located at the N-terminus). Since SignalNet and CleavageNet have one output neuron we need a threshold to classify a given residue in the signal peptide class or not. Then, if $O_S(i)$ is the network output for the position $i$ we have
\begin{displaymath}
O_S(i) \geq \theta \quad \Rightarrow i \in signal \quad peptide \; .
\end{displaymath} (3)

In this paper we set the decision threshold $\theta=0.5$. Analogously for CleavageNet output $O_C$ we define
\begin{displaymath}
O_C(i) \geq 0.5 \quad \Rightarrow i \quad is a cleavage site \; .
\end{displaymath} (4)

The final best architectures are listed here


next up previous
Next: SPEP analysis Up: SPEP: a Signal Peptide Previous: The Sequence Data Base
2003-06-12