<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>nsp-protein-4-7</ui>
   <ji>prot-1</ji>
   <fm>
      <dochead>NSP Primer</dochead>
      <bibl>
         <title>
            <p>Structure from Sequence: Profile-Based Threading and "Rosetta"</p>
         </title>
         <aug>
            <au id="A1">
               <snm>Petsko</snm>
               <mi>A</mi>
               <fnm>Gregory</fnm>
            </au>
            <au id="A2">
               <snm>Ringe</snm>
               <fnm>Dagmar</fnm>
            </au>
         </aug>
         <source>Protein Structure and Function</source>
         <pubdate>2003</pubdate>
         <volume>4</volume>
         <issue>From Sequence to Function</issue>
         <fpage>7</fpage>
         <lpage>7</lpage>
      </bibl>
      <history>
         <pub>
            <date>
               <day>20</day>
               <month>5</month>
               <year>2003</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2004</year>
         <collab>New Science Press Ltd</collab>
      </cpyrt>
   </fm>
   <bdy>
      <sec num="4-18">
         <st>
            <p>Profile-based threading tries to predict the structure of a sequence even if no sequence homologs are known</p>
         </st>
         <p>The most important method that has been developed so far for the identification of a protein fold from sequence information alone in the absence of any apparent sequence identity to any other protein, is the method of "profile-based threading". In this method, a computer program forces the sequence to adopt every known protein fold in turn, and in each case a scoring function is calculated that measures the suitability of the sequence for that particular fold (<figr fid="F4_25">Figure 4-25</figr>).</p>
         <fig id="F4_25">
            <title>
               <p>Figure 4-25</p>
            </title>
            <caption>
               <p>
                  <b>The method of profile-based threading</b>
               </p>
            </caption>
            <text>
               <p>A sequence of unknown structure is forced to adopt all known protein domain folds, and scored for its suitability for each fold. The z-value relates the score for the query sequence to the average score for a set of random sequences with the same amino-acid composition and sequence length. A very high z-score indicates that the sequence almost certainly adopts that fold. Sequences can be submitted online for threading by PSIPRED <url>http://bioinf.cs.ucl.ac.uk/psipred/index.html</url>.</p>
            </text>
            <graphic file="nsp-protein-4-7-4_25"/>
         </fig>
         <p>The function provides a quantitative measure of how well the sequence fits the fold. The method is based on the assumption that three-dimensional structures of proteins have characteristics that are at least semi-quantitatively predictable and that reflect the physical-chemical properties of strings of amino acids in sequences as well as limitations on the types of interactions allowed within a folded polypeptide chain. Does, for example, forcing the sequence to adopt particular secondary structures and intra-protein interactions place hydrophobic residues on the inside and helix-forming residues in helical segments? If so, the score will be relatively high.</p>
         <p>Experience with profile-based threading has shown that a high score, indicating a good fit to a particular fold, can always be trusted. On the other hand, a low score only indicates that a fit was not found; it does not necessarily indicate that the sequence cannot adopt that fold. Thus, if the method fails to find any fold with a significantly high score, nothing has been learned about the sequence. Despite this limitation, profile-based threading is a powerful method that has been able to identify the general fold for many sequences. It cannot provide fine details of the structure, however, because at such low levels of sequence identity to the reference fold the local interactions and side-chain conformations will not necessarily be the same.</p>
      </sec>
      <sec num="4-19">
         <st>
            <p>The Rosetta method attempts to predict protein structure from sequence without the aid of a homologous sequence or structure</p>
         </st>
         <p>Ideally, one would like to be able to compute the correct structure for any protein from sequence information alone, even in the absence of homology. Ongoing efforts to achieve this "holy grail" of structure prediction have met with mixed success. Periodically these methods are tested against proteins of known but unpublished structures in a formal competition called CASP (critical assessment of techniques for protein structure prediction). Perhaps the most promising at the moment is the Rosetta method. One of the fundamental assumptions underlying Rosetta is that the distribution of conformations sampled for a given short segment of the sequence is reasonably well approximated by the distribution of structures adopted by that sequence and closely related sequences in known protein structures. Fragment libraries for short segments of the chain are extracted from the protein structure database. At no point is knowledge of the overall native structure used to select fragments or fix segments of the structure. The conformational space defined by these fragments is then searched using a Monte Carlo procedure with an energy function that favors compact structures with paired strands and buried hydrophobic residues. A total of 1,000 independent simulations are carried out for each query sequence, and the resulting structures are clustered. One selection method was simply to choose the centers of the largest clusters as the highest-confidence models. These cluster centers are then rank-ordered according to the size of the clusters they represent, with the cluster centers representing the largest clusters being designated as the highest-confidence models. Before clustering, most structures produced by Rosetta are incorrect (that is, good structures account for less than 10% of the conformations produced); for this reason, most conformations generated by Rosetta are referred to as decoys (<figr fid="F4_26">Figure 4-26</figr>). The problem of discriminating between good and bad decoys in Rosetta populations is still under investigation. Still, in some test calculations, the best cluster center has been shown to agree fairly well with the overall fold of the protein (<figr fid="F4_27">Figure 4-27</figr>).</p>
         <fig id="F4_26">
            <title>
               <p>Figure 4-26</p>
            </title>
            <caption>
               <p>
                  <b>Some decoy structures produced by the Rosetta method</b>
               </p>
            </caption>
            <text>
               <p>The structure at the center is the target, the experimentally determined structure of a homeodomain. The other structures are generated by the Monte Carlo approach in Rosetta, using only the sequence of the protein. Although some of the structures are quite far from the true structure, others are close enough for the fold to be recognizable. Rmsd is the root mean square deviation in &#945;-carbon positions between the computed structure and the experimentally determined structure. (Taken from Simons, K.T. <it>et al.</it>: <it>J. Mol. Biol. </it>1997, <b>268</b>:209&#8211;225.)</p>
            </text>
            <graphic file="nsp-protein-4-7-4_26"/>
         </fig>
         <fig id="F4_27">
            <title>
               <p>Figure 4-27</p>
            </title>
            <caption>
               <p>
                  <b>Examples of the best-center cluster found by Rosetta for a number of different test proteins</b>
               </p>
            </caption>
            <text>
               <p>The level of agreement with the known native structure varies, but in many cases the overall fold is predicted well enough to be recognizable. Note, however, that the relative positions of the secondary structure elements are almost always shifted at least somewhat from their true values. Graphics kindly provided by Richard Bonneau and David Baker. (Adapted from Bonneau, R. <it>et al.</it>: <it>Proteins </it>2001, <b>45(S5)</b>:119&#8211;126.)</p>
            </text>
            <graphic file="nsp-protein-4-7-4_27"/>
         </fig>
         <p>Both the Rosetta method and the method of profile-based threading suffer from some of the same limitations that beset homology modeling. The issue of false positives and negatives is significant, because the failure to generate a model does not mean one cannot be generated, nor that the structure is a novel one. And the generation of a model does not mean it is right, either overall or, more usually, in detail. At best one should look to these methods, at least for the present, for rough indication of fold class and secondary structure topology. And it is important to remember that all methods of model building based on a preexisting structure, whether found by sequence homology or by threading, suffer from massive feedback and bias. The structure obtained will always look like the input structure, because the computational tools for refining the model are unable to generate the kinds of shifts in secondary structure position and local tertiary structure conformations that are likely to exist between two proteins when their overall sequence identity is low (see <xfigr fid="F4_19" art="nsp-protein-4-5">Figure 4-19</xfigr>). <it>Ab initio </it>methods like Rosetta at least do not suffer from this problem, whatever their other limitations.</p>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p><b>Figure 4-26</b> Some decoy structures produced by the Rosetta method. Reprinted from <it>J. Mol. Biol.</it>, Volume <b>268</b>, Simons, K.T., Kooperberg, C., Huang, E. and Baker, D.: <b>Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions.</b> Pages 209-225, &#169; 1997, with permission from Elsevier.</p>
            <p><b>Figure 4-27</b> Examples of the best-center cluster found by Rosetta for a number of different test proteins. Kindly provided by Richard Bonneau and David Baker. Bonneau, R., Tsai, J., Ruczinski, I., Chivian, D., Rohl, C., Strauss, C.E. and Baker, D.: Rosetta in CASP4: <b>Progress in <it>ab initio</it> protein structure prediction.</b><it> Proteins</it> 2001, <b>45(S5)</b>:119-126. Copyright &#169; 2001 Wiley-Liss Inc. Reproduced with permission of John Wiley &amp; Sons, Inc.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>Rosetta in CASP4: Progress in <it>ab initio </it>protein structure prediction.</p>
            </title>
            <aug>
               <au>
                  <snm>Bonneau</snm>
                  <fnm>R</fnm>
               </au>
               <etal/>
            </aug>
            <source>Proteins</source>
            <pubdate>2001</pubdate>
            <volume>45</volume>
            <issue>S5</issue>
            <fpage>119</fpage>
            <lpage>126</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1002/prot.1170</pubid>
                  <pubid idtype="pmpid" link="fulltext">11835488</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>A method to identify protein sequences that fold into a known three-dimensional structure.</p>
            </title>
            <aug>
               <au>
                  <snm>Bowie</snm>
                  <fnm>JU</fnm>
               </au>
               <etal/>
            </aug>
            <source>Science</source>
            <pubdate>1991</pubdate>
            <volume>253</volume>
            <fpage>164</fpage>
            <lpage>170</lpage>
            <xrefbib>
               <pubid idtype="pmpid">1853201</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Factors limiting the performance of prediction-based fold recognition methods.</p>
            </title>
            <aug>
               <au>
                  <snm>de la Cruz</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Thornton</snm>
                  <fnm>JM</fnm>
               </au>
            </aug>
            <source>Protein Sci</source>
            <pubdate>1999</pubdate>
            <volume>8</volume>
            <fpage>750</fpage>
            <lpage>759</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">10211821</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>Protein fold recognition using sequence-derived predictions.</p>
            </title>
            <aug>
               <au>
                  <snm>Fischer</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Eisenberg</snm>
                  <fnm>D</fnm>
               </au>
            </aug>
            <source>Protein Sci</source>
            <pubdate>1996</pubdate>
            <volume>5</volume>
            <fpage>947</fpage>
            <lpage>955</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">8732766</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Protein fold recognition by sequence threading: tools and assessment techniques.</p>
            </title>
            <aug>
               <au>
                  <snm>Miller</snm>
                  <fnm>RT</fnm>
               </au>
               <etal/>
            </aug>
            <source>FASEB J</source>
            <pubdate>1996</pubdate>
            <volume>10</volume>
            <fpage>171</fpage>
            <lpage>178</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">8566539</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions.</p>
            </title>
            <aug>
               <au>
                  <snm>Simons</snm>
                  <fnm>KT</fnm>
               </au>
               <etal/>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>1997</pubdate>
            <volume>268</volume>
            <fpage>209</fpage>
            <lpage>225</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.1997.0959</pubid>
                  <pubid idtype="pmpid" link="fulltext">9149153</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Prospects for <it>ab initio</it> protein structural genomics.</p>
            </title>
            <aug>
               <au>
                  <snm>Simons</snm>
                  <fnm>KT</fnm>
               </au>
               <etal/>
            </aug>
            <source>J Mol Biol</source>
            <pubdate>2001</pubdate>
            <volume>306</volume>
            <fpage>1191</fpage>
            <lpage>1199</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1006/jmbi.2000.4459</pubid>
                  <pubid idtype="pmpid" link="fulltext">11237627</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>URL for threading website:</p>
            </title>
            <note>
               <url>http://bioinf.cs.ucl.ac.uk/psipred/index.html</url>
            </note>
         </bibl>
         <bibl id="B9">
            <title>
               <p>URL for CASP:</p>
            </title>
            <note>
               <url>http://moult.carb.nist.gov/casp</url>
            </note>
         </bibl>
      </refgrp>
   </bm>
</art>
