现在得到的整个HTML里,包含有很多条tbody,它们的id都是以"flt"开头的...如下:
<tbody data="2010-6-20 7:30:00|2010-6-20 9:55:00|CTU|PEK|3U|L|1440||5|7|28" id="flt1">.....(其他html代码)</tbody>
<tbody data="2010-6-20 7:30:00|2010-6-20 9:55:00|CTU|PEK|3U|L|1440||5|7|28" id="flt2">.....(其他html代码)</tbody>
<tbody data="2010-6-20 7:30:00|2010-6-20 9:55:00|CTU|PEK|3U|L|1440||5|7|28" id="flt3">.....(其他html代码)</tbody>
<tbody data="2010-6-20 7:30:00|2010-6-20 9:55:00|CTU|PEK|3U|L|1440||5|7|28" id="flt4">.....(其他html代码)</tbody>
<tbody data="2010-6-20 7:30:00|2010-6-20 9:55:00|CTU|PEK|3U|L|1440||5|7|28" id="flt5">.....(其他html代码)</tbody>
如何把这些tbody 提取出来,存储于数组里:例如:Array
(
[1] => '<tbody data="2010-6-20 7:30:00|2010-6-20 9:55:00|CTU|PEK|3U|L|1440||5|7|28" id="flt1">.....(其他html代码)</tbody>'
)
Array
(
[2] => '<tbody data="2010-6-20 7:30:00|2010-6-20 9:55:00|CTU|PEK|3U|L|1440||5|7|28" id="flt2">.....(其他html代码)</tbody>'
)
Array
(
[3] => '<tbody data="2010-6-20 7:30:00|2010-6-20 9:55:00|CTU|PEK|3U|L|1440||5|7|28" id="flt3">.....(其他html代码)</tbody>'
)
Array
(
[4] => '<tbody data="2010-6-20 7:30:00|2010-6-20 9:55:00|CTU|PEK|3U|L|1440||5|7|28" id="flt4">.....(其他html代码)</tbody>'
)
Array
(
[5] => '<tbody data="2010-6-20 7:30:00|2010-6-20 9:55:00|CTU|PEK|3U|L|1440||5|7|28" id="flt5">.....(其他html代码)</tbody>'
)
<tbody data="2010-6-20 7:30:00|2010-6-20 9:55:00|CTU|PEK|3U|L|1440||5|7|28" id="flt1">.....(其他html代码)</tbody>
<tbody data="2010-6-20 7:30:00|2010-6-20 9:55:00|CTU|PEK|3U|L|1440||5|7|28" id="flt2">.....(其他html代码)</tbody>
<tbody data="2010-6-20 7:30:00|2010-6-20 9:55:00|CTU|PEK|3U|L|1440||5|7|28" id="flt3">.....(其他html代码)</tbody>
<tbody data="2010-6-20 7:30:00|2010-6-20 9:55:00|CTU|PEK|3U|L|1440||5|7|28" id="flt4">.....(其他html代码)</tbody>
<tbody data="2010-6-20 7:30:00|2010-6-20 9:55:00|CTU|PEK|3U|L|1440||5|7|28" id="flt5">.....(其他html代码)</tbody>
如何把这些tbody 提取出来,存储于数组里:例如:Array
(
[1] => '<tbody data="2010-6-20 7:30:00|2010-6-20 9:55:00|CTU|PEK|3U|L|1440||5|7|28" id="flt1">.....(其他html代码)</tbody>'
)
Array
(
[2] => '<tbody data="2010-6-20 7:30:00|2010-6-20 9:55:00|CTU|PEK|3U|L|1440||5|7|28" id="flt2">.....(其他html代码)</tbody>'
)
Array
(
[3] => '<tbody data="2010-6-20 7:30:00|2010-6-20 9:55:00|CTU|PEK|3U|L|1440||5|7|28" id="flt3">.....(其他html代码)</tbody>'
)
Array
(
[4] => '<tbody data="2010-6-20 7:30:00|2010-6-20 9:55:00|CTU|PEK|3U|L|1440||5|7|28" id="flt4">.....(其他html代码)</tbody>'
)
Array
(
[5] => '<tbody data="2010-6-20 7:30:00|2010-6-20 9:55:00|CTU|PEK|3U|L|1440||5|7|28" id="flt5">.....(其他html代码)</tbody>'
)
<tbody data="2010-6-20 7:30:00|2010-6-20 9:55:00|CTU|PEK|3U|L|1440||5|7|28" id="flt1">.....(其他html代码)</tbody>
<tbody data="2010-6-20 7:30:00|2010-6-20 9:55:00|CTU|PEK|3U|L|1440||5|7|28" id="flt2">.....(其他html代码)</tbody>
<tbody data="2010-6-20 7:30:00|2010-6-20 9:55:00|CTU|PEK|3U|L|1440||5|7|28" id="flt3">.....(其他html代码)</tbody>
<tbody data="2010-6-20 7:30:00|2010-6-20 9:55:00|CTU|PEK|3U|L|1440||5|7|28" id="flt4">.....(其他html代码)</tbody>
<tbody data="2010-6-20 7:30:00|2010-6-20 9:55:00|CTU|PEK|3U|L|1440||5|7|28" id="flt5">.....(其他html代码)</tbody>
</tbody>html;$pattern = "/<tbody\s*data=\".*?\"\s*id=\".*?\">.*?<\/tbody>/is";
preg_match_all($pattern, $str, $aMatch);
print_r($aMatch[0]);
刚才那个只是其中的一条
<tbody data="2010-6-20 7:30:00|2010-6-20 9:55:00|CTU|PEK|3U|L|1440||5|7|28" id="flt1">.....(其他html代码)</tbody>
啊...因为页面中有很多条这样的tbody啊.... 所以我想把这很多条tbody,先提取出来放到一个数组里,
然后再调用你刚才的那个方法...
厉害!!!!大哥!但是现在返回的是:Array ( [0] => .....(其他html代码) [1] => .....(其他html代码) [2] => .....(其他html代码) [3] => .....(其他html代码) [4] => .....(其他html代码) ) 我想数组的每个值将各自前面的 <tbody data="2010-6-20 7:30:00|2010-6-20 9:55:00|CTU|PEK|3U|L|1440||5|7|28" id="flt1"> 和 后面的 </tbody> 都包含在里面应该怎么做啊? 即我想要:
Array (
[0] =><tbody data="2010-6-20 7:30:00|2010-6-20 9:55:00|CTU|PEK|3U|L|1440||5|7|28" id="flt1"> .....(其他html代码)</tbody>
[1] => <tbody data="2010-6-20 7:30:00|2010-6-20 9:55:00|CTU|PEK|3U|L|1440||5|7|28" id="flt2">.....(其他html代码)</tbody>
[2] =><tbody data="2010-6-20 7:30:00|2010-6-20 9:55:00|CTU|PEK|3U|L|1440||5|7|28" id="flt3"> .....(其他html代码)</tbody>
[3] => <tbody data="2010-6-20 7:30:00|2010-6-20 9:55:00|CTU|PEK|3U|L|1440||5|7|28" id="flt4">.....(其他html代码)</tbody>
[4] => <tbody data="2010-6-20 7:30:00|2010-6-20 9:55:00|CTU|PEK|3U|L|1440||5|7|28" id="flt5">.....(其他html代码) )</tbody>
$html='<tbody data="2010-6-20 21:15:00|2010-6-20 23:20:00|CTU|NAY|KN|M|1080||5|7|1,2,32,30,28"
id="flt30">
<tr>
<td>
<p>21:15 双流国际机场</p>
<p>23:20 南苑机场</p>
</td>
<td>
<p class="pubFlights_kn">中国联航</p>
<p class="searchresult_fltlist_airline">KN2284</p>
</td>
<td align="middle"><span mod_jmpinfo_page="fltDomestic_planeType.asp?CraftType=738" mod="jmpInfo" cdm="jpi_flighttypeli" class="base_txtdiv">738</span></td><td align="middle">50/40</td><td align="middle"><p><span mod_jmpinfo_content="说明|经济舱全价:1440<BR/>M :经济舱" mod_jmpinfo_page="default_normal.asp" mod="jmpInfo" cdm="jpi_discountdetailli" class="base_txtdiv">7.5折/M</span></p>
<p><span mod_jmpinfo_content="退改签规定|同等舱位免费更改。|需收取票面价10%的退票费。|不得签转。||" mod_jmpinfo_page="flight_policy_tab" mod="jmpInfo" cdm="jpi_refundinfoli" class="base_txtdiv">退改签</span></p></td><td><p><strong class="base_price01">¥1080</strong>经济舱</p>
<p><a class="searchresult_fltlist_down" onclick="FlightUI.showMore(this,GetShowAllSubclassParameter(\'KN2284 \',\'flt30\'))" title="查看所有价格" href="javascript:void(0);" id="flt_a30" cdm="btn_allpriceli">查看所有价格</a></p>
</td>
<td><div cdm="jpi_noticeli" style="overflow: hidden; width: 124px;">
</div>
</td>
<td align="right" width="145">
<span class="searchresult_fltlist_savemoney" mod_jmpinfo_content="说明|" mod_jmpinfo_page="default_normal" mod="jmpInfo" style="visibility: hidden;" cdm="jpi_explanationli"> </span>
<input type="button" style="float: right;" class="base_btn11" onclick="SelectFlight(\'KN2284\', \'M\', \'1080\', \'NormalPrice\', \'\',\'False\')" value="预订" cdm="btn_orderli">
</td>
</tr>
</tbody>
<tbody data="2010-6-21 21:15:00|2010-6-22 23:20:00|CTU|NAY|KN|M|1080||5|7|1,2,32,30,28"
id="flt31">
<tr>
<td>
<p>21:15 双流国际机场1</p>
<p>23:20 南苑机场1</p>
</td>
<td>
<p class="pubFlights_kn">中国联航1</p>
<p class="searchresult_fltlist_airline">KN9527</p>
</td>
<td align="middle"><span mod_jmpinfo_page="fltDomestic_planeType.asp?CraftType=738" mod="jmpInfo" cdm="jpi_flighttypeli" class="base_txtdiv">888</span></td><td align="middle">60/50</td><td align="middle"><p><span mod_jmpinfo_content="说明|经济舱全价:1440<BR/>M :经济舱" mod_jmpinfo_page="default_normal.asp" mod="jmpInfo" cdm="jpi_discountdetailli" class="base_txtdiv">6.5折/M</span></p>
<p><span mod_jmpinfo_content="退改签规定|同等舱位免费更改。|需收取票面价10%的退票费。|不得签转。||" mod_jmpinfo_page="flight_policy_tab" mod="jmpInfo" cdm="jpi_refundinfoli" class="base_txtdiv">退改签1</span></p></td><td><p><strong class="base_price01">¥1080</strong>经济舱</p>
<p><a class="searchresult_fltlist_down" onclick="FlightUI.showMore(this,GetShowAllSubclassParameter(\'KN2284 \',\'flt30\'))" title="查看所有价格" href="javascript:void(0);" id="flt_a30" cdm="btn_allpriceli">查看所有价格</a></p>
</td>
<td><div cdm="jpi_noticeli" style="overflow: hidden; width: 124px;">
</div>
</td>
<td align="right" width="145">
<span class="searchresult_fltlist_savemoney" mod_jmpinfo_content="说明|" mod_jmpinfo_page="default_normal" mod="jmpInfo" style="visibility: hidden;" cdm="jpi_explanationli"> </span>
<input type="button" style="float: right;" class="base_btn11" onclick="SelectFlight(\'KN2284\', \'M\', \'1080\', \'NormalPrice\', \'\',\'False\')" value="预订" cdm="btn_orderli">
</td>
</tr>
</tbody>';
$html = preg_replace('/>(\s+)/','>',$html);
$html = preg_replace('/(\s+)</','<',$html);
$html = str_replace("\r\n",'',$html);
$html = str_replace("\t",'',$html);
$result=array();
preg_match_all('~<tbody data="(.*?)" id="flt[\d]{1,}"><tr>~is',$html,$data);
foreach($data[1] as $key=>$item){
$result[$key]["data"]=$item;
}
preg_match_all('~" id="flt[\d]{1,}"><tr><td><p>(.*?)</p><p>(.*?)</p></td><td><p class="pubFlights_kn">~is',$html,$data);
foreach($data[1] as $key=>$item){
$result[$key]["from"]=$item;
}
foreach($data[2] as $key=>$item){
$result[$key]["to"]=$item;
}
preg_match_all('~<td><p class="pubFlights_kn">(.*?)</p><p class="searchresult_fltlist_airline">(.*?)</p></td><td align="middle">~is',$html,$data);
foreach($data[1] as $key=>$item){
$result[$key]["flyname"]=$item;
}
foreach($data[2] as $key=>$item){
$result[$key]["flyno"]=$item;
}
preg_match_all('~mod="jmpInfo" cdm="jpi_flighttypeli" class="base_txtdiv">(.*?)</span></td><td align="middle">~is',$html,$data);
foreach($data[1] as $key=>$item){
$result[$key]["flytype"]=$item;
}
preg_match_all('~</span></td><td align="middle">(.*?)</td><td align="middle"><p>~is',$html,$data);
foreach($data[1] as $key=>$item){
$result[$key]["flybi"]=$item;
}
preg_match_all('~mod="jmpInfo" cdm="jpi_discountdetailli" class="base_txtdiv">(.*?)</span></p><p>~is',$html,$data);
foreach($data[1] as $key=>$item){
$result[$key]["discountdetail"]=$item;
}
preg_match_all('~mod="jmpInfo" cdm="jpi_refundinfoli" class="base_txtdiv">(.*?)</span></p></td>~is',$html,$data);
foreach($data[1] as $key=>$item){
$result[$key]["refundinfo"]=$item;
}
print_r($result);
我获取html的代码是:$opts = array(
'http'=>array(
'method'=>"GET",
'header'=>"Host: flights.ctrip.com\r\n" . "Accept-language: zh-cn\r\n" . "User-Agent: mozilla/5.0 (windows; u; windows nt 5.1; zh-cn; rv:1.9.2.3) gecko/20100401 firefox/3.6.3\r\n" . "Accept: *//*" ) ); //http://rxkjfz.cn.alibaba.com/athena/contact/rxkjfz.html $context = stream_context_create($opts); $url = "http://flights.ctrip.com/Domestic/showfarefirst.aspx?DCity1=CTU&ACity1=BJS&DDate1=2010-06-15&DDate2=2010-6-17&passengerQuantity=1&SendTicketCity=成都&Airline=All&PassengerType=ADU&SearchType=D&RouteIndex=1"; echo file_get_contents($url, False, $context);//http://rxkjfz.cn.alibaba.com/athena/contact/rxkjfz.html
$context = stream_context_create($opts);
//$url = "http://flights.ctrip.com/Domestic/showfarefirst.aspx?DCity1=CTU&ACity1=BJS&DDate1=2010-06-20&DDate2=2010-6-22&passengerQuantity=1&SendTicketCity=成都&Airline=All&PassengerType=ADU&SearchType=D&RouteIndex=1";
$url = "http://flights.ctrip.com/Domestic/ShowFareFirst.aspx?DCity1=CTU&ACity1=BJS&DCityName1=%B3%C9%B6%BC&ACityName1=%B1%B1%BE%A9&DDate1=2010-6-20&ClassType=&PassengerQuantity=1&SendTicketCity=%B3%C9%B6%BC&Airline=&PassengerType=ADU&";
$html = file_get_contents($url, False, $context);
print_r($html);
然后将大哥你刚才的代码加到后面:$html = preg_replace('/>(\s+)/','>',$html);
$html = preg_replace('/(\s+)</','<',$html);
$html = str_replace("\r\n",'',$html);
$html = str_replace("\t",'',$html);
$result=array();
preg_match_all('~<tbody data="(.*?)" id="flt[\d]{1,}"><tr>~is',$html,$data);
foreach($data[1] as $key=>$item){
$result[$key]["data"]=$item;
}
preg_match_all('~" id="flt[\d]{1,}"><tr><td><p>(.*?)</p><p>(.*?)</p></td><td><p class="pubFlights_kn">~is',$html,$data);
foreach($data[1] as $key=>$item){
$result[$key]["from"]=$item;
}
foreach($data[2] as $key=>$item){
$result[$key]["to"]=$item;
}
preg_match_all('~<td><p class="pubFlights_kn">(.*?)</p><p class="searchresult_fltlist_airline">(.*?)</p></td><td align="middle">~is',$html,$data);
foreach($data[1] as $key=>$item){
$result[$key]["flyname"]=$item;
}
foreach($data[2] as $key=>$item){
$result[$key]["flyno"]=$item;
}
preg_match_all('~mod="jmpInfo" cdm="jpi_flighttypeli" class="base_txtdiv">(.*?)</span></td><td align="middle">~is',$html,$data);
foreach($data[1] as $key=>$item){
$result[$key]["flytype"]=$item;
}
preg_match_all('~</span></td><td align="middle">(.*?)</td><td align="middle"><p>~is',$html,$data);
foreach($data[1] as $key=>$item){
$result[$key]["flybi"]=$item;
}
preg_match_all('~mod="jmpInfo" cdm="jpi_discountdetailli" class="base_txtdiv">(.*?)</span></p><p>~is',$html,$data);
foreach($data[1] as $key=>$item){
$result[$key]["discountdetail"]=$item;
}
preg_match_all('~mod="jmpInfo" cdm="jpi_refundinfoli" class="base_txtdiv">(.*?)</span></p></td>~is',$html,$data);
foreach($data[1] as $key=>$item){
$result[$key]["refundinfo"]=$item;
}
print_r($result);却不能提取出其中的tbody中的数据
这句就和你给的示例不一样。
<tbody data="2010-6-21 21:15:00|2010-6-22 23:20:00|CTU|NAY|KN|M|1080||5|7|1,2,32,30,28"
id="flt31">是顺序?
好像就是 id 和 data 的顺序不一样
$opts = array(
'http'=>array(
'method'=>"GET",
'header'=>"Host: flights.ctrip.com\r\n" . "Accept-language: zh-cn\r\n" . "User-Agent: mozilla/5.0 (windows; u; windows nt 5.1; zh-cn; rv:1.9.2.3) gecko/20100401 firefox/3.6.3\r\n" . "Accept: *//*" ) );
$context = stream_context_create($opts);
$url = "http://flights.ctrip.com/Domestic/ShowFareFirst.aspx?DCity1=CTU&ACity1=BJS&DCityName1=%B3%C9%B6%BC&ACityName1=%B1%B1%BE%A9&DDate1=2010-6-20&ClassType=&PassengerQuantity=1&SendTicketCity=%B3%C9%B6%BC&Airline=&PassengerType=ADU&";
$html = file_get_contents($url, False, $context);$html = preg_replace('/>(\s+)/','>',$html);
$html = preg_replace('/(\s+)</','<',$html);
$html = preg_replace('/(\s+)/',' ',$html);
$html = str_replace("\r\n",'',$html);
$html = str_replace("\t",'',$html);
//echo "<pre>";
$result=array();
preg_match_all('~<tbody id="flt[\d]{1,}" data="(.*?)"><tr><td><p>~is',$html,$data);
foreach($data[1] as $key=>$item){
$result[$key]["data"]=$item;
}
preg_match_all('~ data="(.*?)"><tr><td><p>(.*?)</p><p>(.*?)</p></td><td><p class="~is',$html,$data);
foreach($data[2] as $key=>$item){
$result[$key]["from"]=$item;
}
foreach($data[3] as $key=>$item){
$result[$key]["to"]=$item;
}
preg_match_all('~</p></td><td><p class="pubFlights_3u">(.*?)</p><p class="searchresult_fltlist_airline">(.*?)</p></td><td align="middle">~is',$html,$data);
foreach($data[1] as $key=>$item){
$result[$key]["flyname"]=$item;
}
foreach($data[2] as $key=>$item){
$result[$key]["flyno"]=$item;
}
preg_match_all('~ mod_jmpinfo_page="fltDomestic_planeType.asp?CraftType=[\d]{1,}">(.*?)</span></td><td align="middle">~is',$html,$data);
foreach($data[1] as $key=>$item){
$result[$key]["flytype"]=$item;
}
preg_match_all('~</span></td><td align="middle">(.*?)</td><td align="middle"><p><span class="base_txtdiv"~is',$html,$data);
foreach($data[1] as $key=>$item){
$result[$key]["flybi"]=$item;
}
preg_match_all('~span class="base_txtdiv" cdm="jpi_discountdetailli" mod="jmpInfo" mod_jmpinfo_page="default_normal.asp" mod_jmpinfo_content="(.*?)">(.*?)</span></p><p><span class="base_txtdiv" cdm="jpi_refundinfoli"~is',$html,$data);
foreach($data[2] as $key=>$item){
$result[$key]["discountdetail"]=$item;
}
preg_match_all('~</span></p><p><span class="base_txtdiv" cdm="jpi_refundinfoli"[^>]+>(.*?)</span></p></td><td><p>~is',$html,$data);
foreach($data[1] as $key=>$item){
$result[$key]["refundinfo"]=$item;
}
print_r($result);