html或xml的东西有专门的dom api,特别是标签嵌套的html,尽量不要尝试用正则去获取,尤其是php的正则,这里涉及到正则的递归,即使php能有象其它语言提供的正则平衡组,也最好不要用。

解决方案 »

  1.   

    <(span|div)\s+class=\"\d\">\s+<h3\s+class=\"title\">(.*?)<\/h3>\s+<div\s+class=\"con\">\s*(.*?)\s*<\/div>\s*<\/(span|div)>
      

  2.   

    $s =<<< TXT
    <span class="1">
          <h3 class="title">testtesttesttesttesttestt</h3>
          <div class="con">
                <span>testtesttest</span><p>testtesttesttesttesttesttesttesttest</p>
          </div>
    </span>
    <div class="2">
          <h3 class="title">testtesttesttesttesttestt</h3>
          <div class="con">
                <span>testtesttest</span><p>testtesttesttesttesttesttesttesttest</p>
          </div>
    </div>
    <div class="3">
          <h3 class="title">testtesttesttesttesttestt</h3>
          <div class="con">
                <span>testtesttest</span><p>testtesttesttesttesttesttesttesttest</p>
          </div>
    </div>
    <span class="4">
          <h3 class="title">testtesttesttesttesttestt</h3>
          <div class="con">
                <span>testtesttest</span><p>testtesttesttesttesttesttesttesttest</p>
          </div>
    </span>
    TXT;方案1
    include 'phpquery.php';
    $doc = phpQuery::newDocument($s);
    echo $doc->find('.1')->html();
    echo pq('.2')->html();得      <h3 class="title">testtesttesttesttesttestt</h3>
          <div class="con">
                <span>testtesttest</span><p>testtesttesttesttesttesttesttesttest</p>
          </div>      <h3 class="title">testtesttesttesttesttestt</h3>
          <div class="con">
                <span>testtesttest</span><p>testtesttesttesttesttesttesttesttest</p>
          </div>
    方案2include 'html_document.php';
    $p = new html_document( $s, 0);
    foreach($p->find('.\d') as $v) {
        echo "$v->innerHTML\n";
    }得
          <h3 class="title">testtesttesttesttesttestt</h3>
          <div class="con">
                <span>testtesttest</span><p>testtesttesttesttesttesttesttesttest</p>
          </div>
          <h3 class="title">testtesttesttesttesttestt</h3>
          <div class="con">
                <span>testtesttest</span><p>testtesttesttesttesttesttesttesttest</p>
          </div>
          <h3 class="title">testtesttesttesttesttestt</h3>
          <div class="con">
                <span>testtesttest</span><p>testtesttesttesttesttesttesttesttest</p>
          </div>
          <h3 class="title">testtesttesttesttesttestt</h3>
          <div class="con">
                <span>testtesttest</span><p>testtesttesttesttesttesttesttesttest</p>
          </div>