Search help page

  • Match: indicates how to use the words specified in the Search Field for the protein retrieval:
    • All: All of the words in the Search field are matched in the retrieved protein html pages.
    • Any: Any of the words in the Search field is matched in the retrieved protein html pages.
    • Boolean: It is an explicit Boolean search using AND/OR/NOT and parentheses to define the query. Examples of correct expressions are as follows: "Hydrolase AND Ligase," "Hydrolase NOT Ligase". Note that the operator NOT has the meaning of "without".
    • Format: It indicates how many details per protein will be shown in the result list:
    • Long: It displays a graphical symbol (indicating the query matching level), the titles, the relevant excerpted text, the date of last modification and the size of the matching pages.
    • Short: It displays just a graphical symbol (indicating the query matching level) and the page titles.

  • Matches per page: It indicates how many results will be displayed on a single html page.
  • Search Field: In the search field are specified the words used by the search engine for the protein retrieval. There exist special words that can be used to refine the search. These 'keywords' relates to data derived from the PDB analysis that is computed by the internal Zenpatches engine:
    • 'all_chain' is used to specify a prediction computed on an entire protein. The computation is usually performed on both the whole protein and its separated chains.
    • 'single_chain','monomer','1-mer' are equivalent keywords, representing a subset of 'all_chain' set and are used to specify predictions computed on the PDB files containing only one predictable chain. A "not predictable chain" is represented either by short peptides (single aminoacids, inhibitors, peptide hormones...) or by nucleotides (RNA, DNA, RNA-DNA hybrids).
    • 'multi_chain' is a keyword representing a subset of the 'all_chain' set and it is used to specify predictions computed on the PDB files containing more than one predictable chain (see 'single_chain' for further explanations). In that case a double chain protein with a predicted chain and a not predicted one (due to dssp problems) is classified as 'multi_chain', even if there is only one predictable chain. This is not true for short peptides or nucleotides.
    • the 'multi_chain' ensemble is further divided into subsets according to the number of predictable chains in the PDB file and so we have other keywords (both wordings are equivalent) 'dimer','2-mer', 'trimer','3-mer', 'tetramer','4-mer', 'pentamer','5-mer', 'examer','6-mer', 'eptamer','7-mer', 'octamer','8-mer', 'ennamer','9-mer', 'decamer','10-mer', 'undecamer','11-mer', 'dodecamer','12-mer' and then the single keywords: '13-mer', '14-mer', ...., 'N-mer'.
      Note that it is not strictly a degree of oligomerization-multimerization, because it is related only to the number of chains in the PDB file, and that is often related only to the crystallization cell.
    • 'homomer' indicate a subset of the 'multi_chain' set containing proteins with the same chains. To be classified as a 'homomer' the entire protein must have all the chains equals. Note that this is a orthogonal subdivision with respect to the multimerization one.
    • 'heteromer' indicate a subset of the 'multi_chain' set containing proteins with at least one different chain with respect to the other. Note that this is a orthogonal subdivision with respect to the multimerization one.
    • last, 'short peptide chains', 'nucleotide chains', 'dssp problem chains' and 'no crystal proteins' will be stored into the database and marked respectively with: 'short_peptides', 'nucleotides','DNA-RNA', 'dssp_errors', 'no_coordinates' as keywords.
      Note that the 'no_coordinates' ensemble is a subset of the 'all_chain' set, whereas 'dssp_errors' is an orthogonal ensemble with respect to the 'single_chain', 'multi_chain' or multimerization ones. In addition 'short_peptides', 'nucleotides','DNA-RNA' and 'dssp_errors' subsets cover also spaces outer the 'all_chain' space, being related either to the computation done on each separated chain and to the one done on the whole protein (that is the 'all_chain' space)

For a more intuitive representation see the below scheme (note that there aren't proportions between ensemble dimension and its protein number).

Note that dssp analysis problems (always related to PDB file problems) could give a "not predictable chain", but in this case the chain is further examined to determine if the entire protein is a 'single_chain' protein or a 'multi_chain' one (see).