http://www.drugfuture.com/chemdata/bacilysin.html
比如这个页面,我想从中提取标记为红色的信息,他的抽取模版是什么样的呢物质名称(英文名): Bacilysin
CA登记号(CAS Registry Number):29393-20-2 CAS Name: [1R-(1a,2b,6a)]-N-L-Alanyl-3-(5-oxo-7-oxabicyclo[4.1.0]hept-2-yl)-L-alanine
参考文献(Literature References):  Antibiotic produced by the soil bacillus NCTC 7197: Gilliver et al., in Antibiotics vol. I, Florey et al., Eds. (Oxford, 1949) p 458. Production by Bacillus subtilis and purification: Rogers et al., Biochem. J. 97, 573 (1965). Identity with tetaine: K. Kaminski, T. Sololowska, J. Antibiot. 26, 184 (1973). Identity with bacillin: K. Atsumi et al., ibid. 28, 77 (1975). Improved isoln: Walker, Abraham, Biochem. J. 118, 557 (1970). Structural study: Rogers et al., ibid. 97, 579 (1965). Final structure: Walker, Abraham, ibid. 118, 563 (1970). 
药理活性(Keywords): White amorphous powder. Freely sol in water; sol in 80% alc; sparingly sol in abs alc. Stable in aq soln at 100° for 5 min at pH 7; becomes inactive at pH 2 or pH 9.  
分子式(Molecular Formula): C12H18N2O5 
分子量(Molecular Weight):270.28 

解决方案 »

  1.   

    HTML源码
     
     
    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
    <html xmlns="http://www.w3.org/1999/xhtml">
    <head>
        <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
        <link href="/css/css.css" rel="stylesheet" type="text/css" />
        <style type="text/css">
            a{color:#0055FF;}
            a:visit{color:#0055FF;}
            a:visited{color:#0055FF;}
        </style>
        <title>
            Bacilysin</title>
        <meta content="Bacilysin,[1R-(1alpha,2beta,6alpha)]-N-L-Alanyl-3-(5-oxo-7-oxabicyclo[4.1.0]hept-2-yl)-L-alanine,alpha-[(2-amino-1-oxopropyl)amino]-5-oxo-7-oxabicyclo[4.1.0]heptane-2-propanoic acid,alpha-(2-aminopropionamido)-5-oxo-7-oxabicyclo[4.1.0]heptane-2-propionic acid, stereoisomer,bacillin,tetaine" name="keywords" />
    </head>
    <body>
        <table align="center" width="760">
            <tr>
                <td>
                    <table cellspacing="0" cellpadding="0" border="0" align="left"><tr><td><image src="structure/Bacilysin.gif" alt="Bacilysin" /></td></tr><tr><td><a href="stremf/Bacilysin.emf" target="_blank"><span style="font-family:Arial; font-size:13px">Structural Formula Vector Image</span></a></td></tr><tr><td><script type="text/javascript"><!--
    google_ad_client = "pub-1490375427745779";
    /* 336x280 cds */
    google_ad_slot = "2199354553";
    google_ad_width = 336;
    google_ad_height = 280;
    //-->
    </script>
    <script type="text/javascript"
    src="http://pagead2.googlesyndication.com/pagead/show_ads.js"> 
    </script><script type="text/javascript"><!--
    google_ad_client = "pub-1490375427745779";
    /* 336x280 cds */
    google_ad_slot = "2199354553";
    google_ad_width = 336;
    google_ad_height = 280;
    //-->
    </script>
    <script type="text/javascript"
    src="http://pagead2.googlesyndication.com/pagead/show_ads.js"> 
    </script></td></tr><tr><td><div align="left" style="margin-left: 0pt;"><span style='font-family:Arial; font-size:13px; color:#000000'><b>Title:</b>  Bacilysin</span></div>
    <div align="left" style="margin-left: 0pt;"><span style='font-family:Arial; font-size:13px; color:#000000'><b>CAS Registry Number:</b>  29393-20-2</span></div>
    <div align="left" style="margin-left: 0pt;"><span style='font-family:Arial; font-size:13px; color:#000000'><b>CAS Name:</b>  [1<i>R</i>-(1</span><span style='font-family:Symbol; font-size:13px; color:#000000'>a</span><span style='font-family:Arial; font-size:13px; color:#000000'>,2</span><span style='font-family:Symbol; font-size:13px; color:#000000'>b</span><span style='font-family:Arial; font-size:13px; color:#000000'>,6</span><span style='font-family:Symbol; font-size:13px; color:#000000'>a</span><span style='font-family:Arial; font-size:13px; color:#000000'>)]-<i>N-</i></span><span style='font-family:Arial; font-size:11px; color:#000000'>L</span><span style='font-family:Arial; font-size:13px; color:#000000'>-Alanyl-3-(5-oxo-7-oxabicyclo[4.1.0]hept-2-yl)-</span><span style='font-family:Arial; font-size:11px; color:#000000'>L</span><span style='font-family:Arial; font-size:13px; color:#000000'>-alanine</span></div>
    <div align="left" style="margin-left: 0pt;"><span style='font-family:Arial; font-size:13px; color:#000000'><b>Additional Names:</b>  </span><span style='font-family:Symbol; font-size:13px; color:#000000'>a</span><span style='font-family:Arial; font-size:13px; color:#000000'>-[(2-amino-1-oxopropyl)amino]-5-oxo-7-oxabicyclo[4.1.0]heptane-2-propanoic acid;  </span><span style='font-family:Symbol; font-size:13px; color:#000000'>a</span><span style='font-family:Arial; font-size:13px; color:#000000'>-(2-aminopropionamido)-5-oxo-7-oxabicyclo[4.1.0]heptane-2-propionic acid, stereoisomer;  bacillin;  tetaine</span></div>
    <div align="left" style="margin-left: 0pt;"><span style='font-family:Arial; font-size:13px; color:#000000'><b>Molecular Formula:</b>  C</span><span style='font-family:Arial; font-size:11px; color:#000000'>12</span><span style='font-family:Arial; font-size:13px; color:#000000'>H</span><span style='font-family:Arial; font-size:11px; color:#000000'>18</span><span style='font-family:Arial; font-size:13px; color:#000000'>N</span><span style='font-family:Arial; font-size:11px; color:#000000'>2</span><span style='font-family:Arial; font-size:13px; color:#000000'>O</span><span style='font-family:Arial; font-size:11px; color:#000000'>5</span></div>
    <div align="left" style="margin-left: 0pt;"><span style='font-family:Arial; font-size:13px; color:#000000'><b>Molecular Weight:</b>  270.28</span></div>
    <div align="left" style="margin-left: 0pt;"><span style='font-family:Arial; font-size:13px; color:#000000'><b>Percent Composition:</b>  C 53.33%, H 6.71%, N 10.36%, O 29.60%</span></div>
    <div align="left" style="margin-left: 0pt;"><span style='font-family:Arial; font-size:13px; color:#000000'><b>Literature References:</b>  Antibiotic produced by the soil bacillus NCTC 7197:  Gilliver <i>et al.,</i> in <i>Antibiotics</i> <b>vol. I,</b> Florey <i>et al.,</i> Eds. (Oxford, 1949) p 458.  Production by <i>Bacillus subtilis</i> and purification:  Rogers <i>et al.,</i> <i>Biochem. J.</i> <b>97,</b> 573 (1965).  Identity with tetaine:  K. Kaminski, T. Sololowska, <i>J. Antibiot.</i> <b>26,</b> 184 (1973).  Identity with bacillin:  K. Atsumi <i>et al.,</i> <i>ibid.</i> <b>28,</b> 77 (1975).  Improved isoln:  Walker, Abraham, <i>Biochem. J.</i> <b>118,</b> 557 (1970).  Structural study:  Rogers <i>et al.,</i> <i>ibid.</i> <b>97,</b> 579 (1965).  Final structure:  Walker, Abraham, <i>ibid.</i> <b>118,</b> 563 (1970).</span></div>
    <div align="left" style="margin-left: 0pt;"><span style='font-family:Arial; font-size:13px; color:#000000'><b>Properties:</b>  White amorphous powder.  Freely sol in water; sol in 80% alc; sparingly sol in abs alc.  Stable in aq soln at 100° for 5 min at pH 7; becomes inactive at pH 2 or pH 9. </span></div></td></tr><tr><td><script type="text/javascript"><!--
    google_ad_client = "pub-1490375427745779";
    /* 728x90, cds */
    google_ad_slot = "4626937294";
    google_ad_width = 728;
    google_ad_height = 90;
    //-->
    </script>
    <script type="text/javascript"
    src="http://pagead2.googlesyndication.com/pagead/show_ads.js"> 
    </script></td></tr></table>
                </td>
            </tr>
            <tr>
                <td>
                    <br /><span style="font-family:Arial; font-size:13px; color:#000000"><b>Other Monographs:</b></span><br /><table cellspacing="1" cellpadding="1" border="0" align="left"><tr><td><a href="Allopregnane-3beta-17alpha-21-triol-11-20-dione.html"><span style="font-family:Arial; font-size:13px">Allopregnane-3&#946;,17&#945;,21-triol-11,20-dione</span></a></td><td><a href="Hamamelitannin.html"><span style="font-family:Arial; font-size:13px">Hamamelitannin</span></a></td><td><a href="Mibefradil.html"><span style="font-family:Arial; font-size:13px">Mibefradil</span></a></td><td><a href="Resin-Ipomea.html"><span style="font-family:Arial; font-size:13px">Resin Ipomea</span></a></td></tr><tr><td><a href="Propyl-Chlorocarbonate.html"><span style="font-family:Arial; font-size:13px">Propyl Chlorocarbonate</span></a></td><td><a href="Potassium-Cyanide.html"><span style="font-family:Arial; font-size:13px">Potassium Cyanide</span></a></td><td><a href="Denopamine.html"><span style="font-family:Arial; font-size:13px">Denopamine</span></a></td><td><a href="Ethyl-Propionate.html"><span style="font-family:Arial; font-size:13px">Ethyl Propionate</span></a></td></tr><tr><td><a href="Fluspirilene.html"><span style="font-family:Arial; font-size:13px">Fluspirilene</span></a></td><td><a href="White-Pine.html"><span style="font-family:Arial; font-size:13px">White Pine</span></a></td><td><a href="Bezafibrate.html"><span style="font-family:Arial; font-size:13px">Bezafibrate</span></a></td><td><a href="Amotriphene.html"><span style="font-family:Arial; font-size:13px">Amotriphene</span></a></td></tr><tr><td><a href="Gibberellic-Acid.html"><span style="font-family:Arial; font-size:13px">Gibberellic Acid</span></a></td><td><a href="Periodyl.html"><span style="font-family:Arial; font-size:13px">Periodyl</span></a></td><td><a href="Probucol.html"><span style="font-family:Arial; font-size:13px">Probucol</span></a></td><td><a href="2-4-Dinitroaniline.html"><span style="font-family:Arial; font-size:13px">2,4-Dinitroaniline</span></a></td></tr></table>
                </td>
            </tr>
            <tr>
                <td>
                    &copy;2010 <a href="http://www.drugfuture.com" target="_blank">DrugFuture</a>-><a href="/chemdata" target="_blank">Chemical Index Database</a><script src="http://s39.cnzz.com/stat.php?id=134747&web_id=134747&show=pic" language="JavaScript" charset="gb2312"></script>
                </td>
            </tr>
        </table>
    </body>
    </html>
      

  2.   

    http://jsoup.org/packages/jsoup-1.5.2.jar
    import org.jsoup.*;
    import org.jsoup.select.*;
    import org.jsoup.nodes.*;public class Extract {
        public static void main(final String[] args) {
            try{
                Document document = Jsoup.connect("http://www.drugfuture.com/chemdata/bacilysin.html").get();
                Elements elements = document.select("table:eq(0) div");
                for(Element element: elements){
                    String text = element.text();
                    System.out.println(text.substring(text.indexOf(":") + 2));
                }
            }catch(java.io.IOException e){
                System.err.println(e);
            }
        }
    }
      

  3.   


    我把jsoup.jar添加到工程中了,但是它里面好像没东西一样,不像其他的包前面有个+可以展开,程序里好像没有找到这个包,提示Jsoup cannot be resolved;Document cannot be resolved to a type
      

  4.   

    怎么会没有?
    sfk larc jsoup-1.5.2.jar 
    jsoup-1.5.2.jar//META-INF/MANIFEST.MF
    jsoup-1.5.2.jar//org/jsoup/Connection$Base.class
    jsoup-1.5.2.jar//org/jsoup/Connection$KeyVal.class
    jsoup-1.5.2.jar//org/jsoup/Connection$Method.class
    jsoup-1.5.2.jar//org/jsoup/Connection$Request.class
    jsoup-1.5.2.jar//org/jsoup/Connection$Response.class
    jsoup-1.5.2.jar//org/jsoup/Connection.class
    jsoup-1.5.2.jar//org/jsoup/examples/ListLinks.class
    jsoup-1.5.2.jar//org/jsoup/helper/DataUtil.class
    jsoup-1.5.2.jar//org/jsoup/helper/HttpConnection$1.class
    jsoup-1.5.2.jar//org/jsoup/helper/HttpConnection$Base.class
    jsoup-1.5.2.jar//org/jsoup/helper/HttpConnection$KeyVal.class
    jsoup-1.5.2.jar//org/jsoup/helper/HttpConnection$Request.class
    jsoup-1.5.2.jar//org/jsoup/helper/HttpConnection$Response.class
    jsoup-1.5.2.jar//org/jsoup/helper/HttpConnection.class
    jsoup-1.5.2.jar//org/jsoup/helper/StringUtil.class
    jsoup-1.5.2.jar//org/jsoup/helper/Validate.class
    jsoup-1.5.2.jar//org/jsoup/Jsoup.class
    jsoup-1.5.2.jar//org/jsoup/nodes/Attribute.class
    jsoup-1.5.2.jar//org/jsoup/nodes/Attributes$1.class
    jsoup-1.5.2.jar//org/jsoup/nodes/Attributes$Dataset$DatasetIterator.class
    jsoup-1.5.2.jar//org/jsoup/nodes/Attributes$Dataset$EntrySet.class
    jsoup-1.5.2.jar//org/jsoup/nodes/Attributes$Dataset.class
    jsoup-1.5.2.jar//org/jsoup/nodes/Attributes.class
    jsoup-1.5.2.jar//org/jsoup/nodes/Comment.class
    jsoup-1.5.2.jar//org/jsoup/nodes/DataNode.class
    jsoup-1.5.2.jar//org/jsoup/nodes/Document$OutputSettings.class
    jsoup-1.5.2.jar//org/jsoup/nodes/Document.class
    jsoup-1.5.2.jar//org/jsoup/nodes/Element.class
    jsoup-1.5.2.jar//org/jsoup/nodes/Entities$EscapeMode.class
    jsoup-1.5.2.jar//org/jsoup/nodes/Entities.class
    jsoup-1.5.2.jar//org/jsoup/nodes/Node$OuterHtmlVisitor.class
    jsoup-1.5.2.jar//org/jsoup/nodes/Node.class
    jsoup-1.5.2.jar//org/jsoup/nodes/TextNode.class
    jsoup-1.5.2.jar//org/jsoup/nodes/XmlDeclaration.class
    jsoup-1.5.2.jar//org/jsoup/parser/Parser.class
    jsoup-1.5.2.jar//org/jsoup/parser/Tag.class
    jsoup-1.5.2.jar//org/jsoup/parser/TokenQueue.class
    jsoup-1.5.2.jar//org/jsoup/safety/Cleaner$ElementMeta.class
    jsoup-1.5.2.jar//org/jsoup/safety/Cleaner.class
    jsoup-1.5.2.jar//org/jsoup/safety/Whitelist$AttributeKey.class
    jsoup-1.5.2.jar//org/jsoup/safety/Whitelist$AttributeValue.class
    jsoup-1.5.2.jar//org/jsoup/safety/Whitelist$Protocol.class
    jsoup-1.5.2.jar//org/jsoup/safety/Whitelist$TagName.class
    jsoup-1.5.2.jar//org/jsoup/safety/Whitelist$TypedValue.class
    jsoup-1.5.2.jar//org/jsoup/safety/Whitelist.class
    jsoup-1.5.2.jar//org/jsoup/select/Collector$Accumulator.class
    jsoup-1.5.2.jar//org/jsoup/select/Collector.class
    jsoup-1.5.2.jar//org/jsoup/select/CombiningEvaluator$And.class
    jsoup-1.5.2.jar//org/jsoup/select/CombiningEvaluator$Or.class
    jsoup-1.5.2.jar//org/jsoup/select/CombiningEvaluator.class
    jsoup-1.5.2.jar//org/jsoup/select/Elements.class
    jsoup-1.5.2.jar//org/jsoup/select/Evaluator$AllElements.class
    jsoup-1.5.2.jar//org/jsoup/select/Evaluator$Attribute.class
    jsoup-1.5.2.jar//org/jsoup/select/Evaluator$AttributeKeyPair.class
    jsoup-1.5.2.jar//org/jsoup/select/Evaluator$AttributeStarting.class
    jsoup-1.5.2.jar//org/jsoup/select/Evaluator$AttributeWithValue.class
    jsoup-1.5.2.jar//org/jsoup/select/Evaluator$AttributeWithValueContaining.class
    jsoup-1.5.2.jar//org/jsoup/select/Evaluator$AttributeWithValueEnding.class
    jsoup-1.5.2.jar//org/jsoup/select/Evaluator$AttributeWithValueMatching.class
    jsoup-1.5.2.jar//org/jsoup/select/Evaluator$AttributeWithValueNot.class
    jsoup-1.5.2.jar//org/jsoup/select/Evaluator$AttributeWithValueStarting.class
    jsoup-1.5.2.jar//org/jsoup/select/Evaluator$Class.class
    jsoup-1.5.2.jar//org/jsoup/select/Evaluator$ContainsOwnText.class
    jsoup-1.5.2.jar//org/jsoup/select/Evaluator$ContainsText.class
    jsoup-1.5.2.jar//org/jsoup/select/Evaluator$Id.class
    jsoup-1.5.2.jar//org/jsoup/select/Evaluator$IndexEquals.class
    jsoup-1.5.2.jar//org/jsoup/select/Evaluator$IndexEvaluator.class
    jsoup-1.5.2.jar//org/jsoup/select/Evaluator$IndexGreaterThan.class
    jsoup-1.5.2.jar//org/jsoup/select/Evaluator$IndexLessThan.class
    jsoup-1.5.2.jar//org/jsoup/select/Evaluator$Matches.class
    jsoup-1.5.2.jar//org/jsoup/select/Evaluator$MatchesOwn.class
    jsoup-1.5.2.jar//org/jsoup/select/Evaluator$Tag.class
    jsoup-1.5.2.jar//org/jsoup/select/Evaluator.class
    jsoup-1.5.2.jar//org/jsoup/select/NodeTraversor.class
    jsoup-1.5.2.jar//org/jsoup/select/NodeVisitor.class
    jsoup-1.5.2.jar//org/jsoup/select/QueryParser.class
    jsoup-1.5.2.jar//org/jsoup/select/Selector$SelectorParseException.class
    jsoup-1.5.2.jar//org/jsoup/select/Selector.class
    jsoup-1.5.2.jar//org/jsoup/select/StructuralEvaluator$Has.class
    jsoup-1.5.2.jar//org/jsoup/select/StructuralEvaluator$ImmediateParent.class
    jsoup-1.5.2.jar//org/jsoup/select/StructuralEvaluator$ImmediatePreviousSibling.class
    jsoup-1.5.2.jar//org/jsoup/select/StructuralEvaluator$Not.class
    jsoup-1.5.2.jar//org/jsoup/select/StructuralEvaluator$Parent.class
    jsoup-1.5.2.jar//org/jsoup/select/StructuralEvaluator$PreviousSibling.class
    jsoup-1.5.2.jar//org/jsoup/select/StructuralEvaluator$Root.class
    jsoup-1.5.2.jar//org/jsoup/select/StructuralEvaluator.class
    jsoup-1.5.2.jar//META-INF/maven/org.jsoup/jsoup/pom.xml
    jsoup-1.5.2.jar//META-INF/maven/org.jsoup/jsoup/pom.properties
      

  5.   

    修改传入select 方法的参数。