- Match: indicates how to use the words specified in the Search
Field for the protein retrieval:
- All: All of the words in the Search field
are matched in the retrieved protein html pages.
- Any: Any of the words in the Search field is matched in the
retrieved protein html pages.
- Boolean: It is an explicit Boolean search using
AND/OR/NOT and parentheses to define the query. Examples
of correct expressions
are as follows: "Hydrolase AND Ligase," "Hydrolase NOT Ligase".
Note that the operator NOT has the meaning of "without".
- Format: It indicates how many details per protein will be
shown in the result list:
- Long: It displays a graphical symbol (indicating the query
matching level), the titles, the relevant excerpted text, the
date of last modification and the size of the matching pages.
- Short: It displays just a graphical symbol (indicating the
query matching level) and the page titles.
- Matches per page: It indicates how many results will be
displayed on a single html page.
- Search Field: In the search field are specified
the words used by the search engine for the protein retrieval. There
exist special words that can be used to refine the search.
These 'keywords' relates to data derived from the PDB analysis
that is computed by the internal Zenpatches engine:
- 'all_chain' is used to specify a prediction computed
on an entire protein. The computation is usually performed
on both the whole protein and its separated chains.
- 'single_chain','monomer','1-mer' are equivalent
keywords, representing a subset of 'all_chain' set
and are used to
specify predictions computed on the PDB files containing
only one predictable chain. A "not predictable chain" is
represented either by short peptides (single aminoacids,
inhibitors, peptide hormones...) or by nucleotides
(RNA, DNA, RNA-DNA hybrids).
- 'multi_chain' is a keyword representing a subset of
the 'all_chain' set and it is used to specify predictions
computed on the PDB files containing more than one predictable
chain (see 'single_chain' for further explanations). In
that case a double chain protein with a predicted chain
and a not predicted one (due to dssp problems) is classified
as 'multi_chain', even if there is only one predictable
chain. This is not true for short peptides or nucleotides.
- the 'multi_chain' ensemble is further divided into subsets
according to the number of predictable chains in the PDB
file and so we have other keywords (both wordings are equivalent)
'dimer','2-mer', 'trimer','3-mer', 'tetramer','4-mer',
'pentamer','5-mer', 'examer','6-mer', 'eptamer','7-mer',
'octamer','8-mer', 'ennamer','9-mer', 'decamer','10-mer',
'undecamer','11-mer', 'dodecamer','12-mer' and then the
single keywords: '13-mer', '14-mer', ...., 'N-mer'.
that it is not strictly a degree of oligomerization-multimerization,
because it is related only to the number of chains in the
PDB file, and that is often related only to the crystallization
- 'homomer' indicate a subset of the 'multi_chain' set
containing proteins with the same chains. To be classified
as a 'homomer' the entire protein must have all the chains
equals. Note that this is a orthogonal subdivision with
respect to the multimerization one.
- 'heteromer' indicate a subset of the 'multi_chain' set
containing proteins with at least one different chain with
respect to the other. Note that this is a orthogonal subdivision
with respect to the multimerization one.
- last, 'short peptide chains', 'nucleotide
'dssp problem chains' and 'no
crystal proteins' will
be stored into the database and marked respectively with:
'nucleotides','DNA-RNA', 'dssp_errors', 'no_coordinates'
Note that the 'no_coordinates' ensemble
is a subset of the 'all_chain' set, whereas 'dssp_errors'
is an orthogonal ensemble with respect to the 'single_chain',
'multi_chain' or multimerization ones. In addition
'nucleotides','DNA-RNA' and 'dssp_errors' subsets cover
also spaces outer the 'all_chain' space, being related
either to the computation done on each separated chain
and to the one done on the whole protein (that is the
For a more intuitive representation
see the below scheme (note that there aren't proportions
between ensemble dimension
its protein number).
Note that dssp analysis problems (always related
to PDB file problems) could give a "not predictable chain",
but in this case the chain is further examined
to determine if the entire protein is a 'single_chain'
protein or a