也许能走个弯路,减轻服务器网络压力。
服务器负责解析HTML数据,统计image标签信息,最后将收集的文本数据送回客户端。
加载图片由客户端来完成,只需读取width,height属性,就完全可以获取图片的原始大小。
好处多多,不过可能的麻烦是防盗链
服务器负责解析HTML数据,统计image标签信息,最后将收集的文本数据送回客户端。
加载图片由客户端来完成,只需读取width,height属性,就完全可以获取图片的原始大小。
好处多多,不过可能的麻烦是防盗链
调试欢乐多
PHP获取资源
javascript 取图片长和宽
启动读取图片进程(138个) 1.3秒
结果文件中记录数 7 个
http://s.huffpost.com/images/v/logos/v4/tagline.gif
http://s.huffpost.com/images/v/logos/v4/homepage.gif?v9
http://i.huffpost.com/gen/559399/thumbs/r-OLBERMANN-huge.jpg
http://s.huffpost.com/images/facebook_promo_connect.png?3
http://images.huffingtonpost.com/2012-04-04-michaeljfoxmarlo2SECOND.jpg
http://images.huffingtonpost.com/2012-04-05-Screenshot20120405at9.40.24AM.jpg
http://i.huffpost.com/gen/557914/thumbs/s-SCORSESE-large300.jpg
原循环改为 foreach($html->find('img') as $element) {
tenor("tenorcall.php?v=$element->src");
}
}
tenorcall.phpfunction ranger($url){
$headers = array( "Range: bytes=0-32768" );
$curl = curl_init($url);
curl_setopt($curl, CURLOPT_HTTPHEADER, $headers);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
return curl_exec($curl);
curl_close($curl);
}//curl设置 $raw = ranger($_GET['v']);
$im = @imagecreatefromstring($raw);
$width = @imagesx($im);
$height = @imagesy($im);
if($width>=200||$height>=200){
file_put_contents('tenorcall.txt', $_GET['v'].PHP_EOL, FILE_APPEND );//得出长大于大于200,宽大于等于200的图片
}
/**
* 函数 tenor
* 功能 启动一个url,但不等待返回
* 参数 $page,待执行的页面程序
* 返回 无
**/
if(! function_exists('tenor')):
function tenor($page) {
$host = $_SERVER["HTTP_HOST"];
$fp = fsockopen($host, 80, $errno, $errmsg);
if(!$fp) {
echo "$errstr ($errno)<br>\n";
} else {
fputs($fp,"GET /$page HTTP/1.0\nHost: $host\n\n");
fclose($fp);
}
}
endif;
代码还是原代码,非但没减少,反而增加了
但因为是并发,所以速度明显提高值得注意的是:tenor 函数在某些web服务器中不能稳定的运行(比如iis6)原因不明
138个照片并发,是不是就消耗了138个连接数?是否需要修改php.ini,增加连接数?此外,CPU和内存开销如何?谢谢。to dream1206,yiwusuo,amani11: 刚才又琢磨了一下他的添加。貌似提交网址后,第一时间(1-3秒内)先返回一张图片,然后在(7-9秒后)返回剩余的图片信息。应该是你们说的那种PHP只获取所有的图片地址,JS判断图片大小,甚至ajax并发传输到第二个PHP页面,判断图片长宽后返回数据。但是不论如何,并发是少不了的。用JS并发和直接PHP并发,2者从资源消耗角度来比,哪个会更少?谢谢。
$url = 'http://www.huffingtonpost.com';
$html = file_get_html ( $url );
$nodes = array ();
$start = microtime ();
$res = array ();if ($html->find ( 'img' )) {
foreach ( $html->find ( 'img' ) as $element ) {
if (startsWith ( $element->src, "/" )) {
$element->src = $url . $element->src;
}
if (! startsWith ( $element->src, "http" )) {
$element->src = $url . "/" . $element->src;
}
$nodes [] = $element->src;
}
}echo "<pre>";
print_r ( imageDownload ( $nodes, 200, 200 ) );
echo "<h1>", microtime () - $start, "</h1>";function imageDownload($nodes, $maxHeight = 0, $maxWidth = 0) { $mh = curl_multi_init ();
$curl_array = array ();
foreach ( $nodes as $i => $url ) {
$curl_array [$i] = curl_init ( $url );
curl_setopt ( $curl_array [$i], CURLOPT_RETURNTRANSFER, true );
curl_setopt ( $curl_array [$i], CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.2) Gecko/20090729 Firefox/3.5.2 (.NET CLR 3.5.30729)' );
curl_setopt ( $curl_array [$i], CURLOPT_CONNECTTIMEOUT, 5 );
curl_setopt ( $curl_array [$i], CURLOPT_TIMEOUT, 15 );
curl_multi_add_handle ( $mh, $curl_array [$i] );
}
$running = NULL;
do {
usleep ( 10000 );
curl_multi_exec ( $mh, $running );
} while ( $running > 0 ); $res = array ();
foreach ( $nodes as $i => $url ) {
$curlErrorCode = curl_errno ( $curl_array [$i] ); if ($curlErrorCode === 0) {
$info = curl_getinfo ( $curl_array [$i] );
$ext = getExtention ( $info ['content_type'] );
if ($info ['content_type'] !== null) {
$temp = "temp/img" . md5 ( mt_rand () ) . $ext;
touch ( $temp );
$imageContent = curl_multi_getcontent ( $curl_array [$i] );
file_put_contents ( $temp, $imageContent );
if ($maxHeight == 0 || $maxWidth == 0) {
$res [] = $temp;
} else {
$size = getimagesize ( $temp );
if ($size [0] >= $maxHeight && $size [0] >= $maxWidth) {
$res [] = $temp;
} else {
unlink ( $temp );
}
}
}
}
curl_multi_remove_handle ( $mh, $curl_array [$i] );
curl_close ( $curl_array [$i] ); } curl_multi_close ( $mh );
return $res;
}function getExtention($type) {
$type = strtolower ( $type );
switch ($type) {
case "image/gif" :
return ".gif";
break;
case "image/png" :
return ".png";
break; case "image/jpeg" :
return ".jpg";
break; default :
return ".img";
break;
}
}function startsWith($str, $prefix) {
$temp = substr ( $str, 0, strlen ( $prefix ) );
$temp = strtolower ( $temp );
$prefix = strtolower ( $prefix );
return ($temp == $prefix);
}执行时间4.8秒。但是 if(in_array($absUrl, $visited))continue; 这行报错。 Warning: in_array() expects parameter 2 to be array, null。 此外最终图片地址并非网络地址,而是本地缓存地址。进一步测试研究。
这样就是网络地址了他是保存为本地文件后用 getimagesize 获取尺寸的他应该是通过 curl 并发的,这个机制我不太了解
应该是 file_get_html 在报错吧
file_get_html 使用 file_get_contents 读取 url 成功率较低
经常要刷两三次才可独到数据
http://www.planeart.cn/?p=1121
点击Add 选择Pin ,贴上网址 http://www.huffingtonpost.com/
在chrome的Network中可以看到有一个请求
GET /pin/create/find_images/?url=http%253A%2F%2Fwww.huffingtonpost.com HTTP/1.1
返回的信息是一个json对象:
images: [http://s.huffpost.com/images/v/logos/v4/homepage.gif?v9,…]
0: "http://s.huffpost.com/images/v/logos/v4/homepage.gif?v9"
1: "http://s.huffpost.com/images/v/logos/v4/tagline.gif"
2: "http://s.huffpost.com/images/splash/t_mini-a.png"
3: "http://s.huffpost.com/images/splash/t_mini-a.png"
4: "http://s.huffpost.com/images/splash/t_mini-a.png"
5: "http://s.huffpost.com/images/splash/t_mini-a.png"
6: "http://s.huffpost.com/images/splash/t_mini-a.png"
7: "http://s.huffpost.com/images/splash/t_mini-a.png"
8: "http://s.huffpost.com/images/splash/t_mini-a.png"
9: "http://s.huffpost.com/images/splash/t_mini-a.png"
10: "http://s.huffpost.com/images/splash/t_mini-a.png"
11: "http://s.huffpost.com/images/splash/t_mini-a.png"
12: "http://s.huffpost.com/images/splash/t_mini-a.png"
13: "http://s.huffpost.com/images/splash/t_mini-a.png"
14: "http://s.huffpost.com/images/splash/t_mini-a.png"
15: "http://s.huffpost.com/images/splash/t_mini-a.png"
16: "http://s.huffpost.com/images/splash/t_mini-a.png"
17: "http://i.huffpost.com/gen/560770/thumbs/r-GSA-LAS-VEGAS-VIDEO-huge.jpg"
18: "http://s.huffpost.com/images/webslice12x12.png"
19: "http://s.huffpost.com/images/v/blog_column.png"
20: "http://s.huffpost.com/contributors/gary-hart/headshot.jpg"
21: "http://www.huffingtonpost.com/images/trans.gif"
22: "http://www.huffingtonpost.com/images/trans.gif"
23: "http://www.huffingtonpost.com/images/trans.gif"
24: "http://images.huffingtonpost.com/2012-04-06-campbellguitar.jpg"
25: "http://www.huffingtonpost.com/images/trans.gif"
26: "http://www.huffingtonpost.com/images/trans.gif"
27: "http://www.huffingtonpost.com/images/trans.gif"
28: "http://www.huffingtonpost.com/images/trans.gif"
29: "http://www.huffingtonpost.com/images/trans.gif"
30: "http://www.huffingtonpost.com/images/trans.gif"
31: "http://images.huffingtonpost.com/2012-04-06-Screenshot20120406at7.09.17PM.jpg"
32: "http://www.huffingtonpost.com/images/trans.gif"
33: "http://www.huffingtonpost.com/images/trans.gif"
34: "http://www.huffingtonpost.com/images/trans.gif"
35: "http://www.huffingtonpost.com/images/trans.gif"
36: "http://www.huffingtonpost.com/images/trans.gif"
37: "http://www.huffingtonpost.com/images/trans.gif"
38: "http://www.huffingtonpost.com/images/trans.gif"
39: "http://www.huffingtonpost.com/images/trans.gif"
40: "http://www.huffingtonpost.com/images/trans.gif"
41: "http://www.huffingtonpost.com/images/trans.gif"
42: "http://www.huffingtonpost.com/images/trans.gif"
43: "http://www.huffingtonpost.com/images/trans.gif"
44: "http://www.huffingtonpost.com/images/trans.gif"
45: "http://www.huffingtonpost.com/images/trans.gif"
46: "http://www.huffingtonpost.com/images/trans.gif"
47: "http://www.huffingtonpost.com/images/trans.gif"
48: "http://www.huffingtonpost.com/images/trans.gif"
49: "http://www.huffingtonpost.com/images/trans.gif"
50: "http://www.huffingtonpost.com/images/trans.gif"
51: "http://www.huffingtonpost.com/images/trans.gif"
52: "http://www.huffingtonpost.com/images/trans.gif"
53: "http://www.huffingtonpost.com/images/trans.gif"
54: "http://www.huffingtonpost.com/images/trans.gif"
55: "http://www.huffingtonpost.com/images/trans.gif"
56: "http://www.huffingtonpost.com/images/trans.gif"
57: "http://www.huffingtonpost.com/images/trans.gif"
58: "http://www.huffingtonpost.com/images/trans.gif"
59: "http://www.huffingtonpost.com/images/trans.gif"
60: "http://www.huffingtonpost.com/images/trans.gif"
61: "http://www.huffingtonpost.com/images/trans.gif"
62: "http://www.huffingtonpost.com/images/trans.gif"
63: "http://www.huffingtonpost.com/images/trans.gif"
64: "http://www.huffingtonpost.com/images/trans.gif"
65: "http://www.huffingtonpost.com/images/trans.gif"
66: "http://www.huffingtonpost.com/images/trans.gif"
67: "http://www.huffingtonpost.com/images/trans.gif"
68: "http://www.huffingtonpost.com/images/trans.gif"
69: "http://www.huffingtonpost.com/images/trans.gif"
70: "http://www.huffingtonpost.com/images/trans.gif"
71: "http://www.huffingtonpost.com/images/trans.gif"
72: "http://www.huffingtonpost.com/images/trans.gif"
73: "http://www.huffingtonpost.com/images/trans.gif"
74: "http://www.huffingtonpost.com/images/trans.gif"
75: "http://s.huffpost.com/images/blank.gif"
76: "http://s.huffpost.com/images/blank.gif"
77: "http://s.huffpost.com/images/blank.gif"
78: "http://s.huffpost.com/images/blank.gif"
79: "http://s.huffpost.com/images/blank.gif"
80: "http://s.huffpost.com/images/blank.gif"
81: "http://s.huffpost.com/images/blank.gif"
82: "http://s.huffpost.com/images/facebook_promo_connect.png?3"
83: "http://s.huffpost.com/images/loader.gif"
84: "http://www.huffingtonpost.com/images/trans.gif"
85: "http://www.huffingtonpost.com/images/trans.gif"
86: "http://www.huffingtonpost.com/images/trans.gif"
87: "http://www.huffingtonpost.com/images/trans.gif"
88: "http://www.huffingtonpost.com/images/trans.gif"
89: "http://www.huffingtonpost.com/images/trans.gif"
90: "http://s.huffpost.com/contributors/gary-hart/headshot.jpg"
91: "http://s.huffpost.com/contributors/mike-campbell/headshot.jpg"
92: "http://s.huffpost.com/contributors/roma-downey/headshot.jpg"
93: "http://s.huffpost.com/contributors/gavin-newsom/headshot.jpg"
94: "http://s.huffpost.com/contributors/sarah-shourd/headshot.jpg"
95: "http://s.huffpost.com/contributors/jacqueline-novogratz/headshot.jpg"
96: "http://s.huffpost.com/contributors/peggy-drexler/headshot.jpg"
97: "http://s.huffpost.com/contributors/mohamed-a-elerian/headshot.jpg"
98: "http://s.huffpost.com/contributors/bill-mckibben/headshot.jpg"
99: "http://s.huffpost.com/contributors/marlo-thomas/headshot.jpg"
100: "http://www.huffingtonpost.com/images/v/something_to_say_button.png"
101: "http://www.huffingtonpost.com/images/trans.gif"
102: "http://www.huffingtonpost.com/images/trans.gif"
103: "http://www.huffingtonpost.com/images/trans.gif"
104: "http://www.huffingtonpost.com/images/trans.gif"
105: "http://www.huffingtonpost.com/images/trans.gif"
106: "http://www.huffingtonpost.com/images/trans.gif"
107: "http://www.huffingtonpost.com/images/trans.gif"
108: "http://www.huffingtonpost.com/images/trans.gif"
109: "http://www.huffingtonpost.com/images/trans.gif"
110: "http://www.huffingtonpost.com/images/trans.gif"
111: "http://www.huffingtonpost.com/images/trans.gif"
112: "http://www.huffingtonpost.com/images/trans.gif"
113: "http://www.huffingtonpost.com/images/trans.gif"
114: "http://www.huffingtonpost.com/images/trans.gif"
115: "http://www.huffingtonpost.com/images/trans.gif"
116: "http://www.huffingtonpost.com/images/trans.gif"
117: "http://www.huffingtonpost.com/images/trans.gif"
118: "http://www.huffingtonpost.com/images/trans.gif"
119: "http://www.huffingtonpost.com/images/trans.gif"
120: "http://www.huffingtonpost.com/images/trans.gif"
121: "http://www.huffingtonpost.com/images/trans.gif"
122: "http://www.huffingtonpost.com/images/trans.gif"
123: "http://www.huffingtonpost.com/images/trans.gif"
124: "http://www.huffingtonpost.com/images/trans.gif"
125: "http://www.huffingtonpost.com/images/trans.gif"
126: "http://www.huffingtonpost.com/images/trans.gif"
127: "http://www.huffingtonpost.com/images/trans.gif"
128: "http://www.huffingtonpost.com/images/trans.gif"
129: "http://www.huffingtonpost.com/images/trans.gif"
130: "http://www.huffingtonpost.com/images/trans.gif"
131: "http://www.huffingtonpost.com/images/trans.gif"
132: "http://www.huffingtonpost.com/images/trans.gif"
133: "http://www.huffingtonpost.com/images/trans.gif"
134: "http://b.scorecardresearch.com/p?c1=2&c2=6723616&c3=&c4=&c5=front&c6=&c15=&cj=1"
135: "http://www.huffingtonpost.com//secure-us.imrworldwide.com/cgi-bin/m?ci=us-703240h&cg=0&cc=1&ts=noscript"
136: "http://vertical-stats.huffpost.com/?-1&&"
137: "http://www.huffingtonpost.com//pixel.quantserve.com/pixel/p-6fTutip1SMLM2.gif?labels=Home"
images_count: 138
redirected: false
status: "success"
title: "Breaking News and Opinion on The Huffington Post"
type: "text/html; charset=utf-8"几乎是服务器返回的同时,浏览器开始加载图片。chrome监控如下。黄色的那个线表示提交url获取图片资源,后面的就都是加载图片了,加载的速度还是取决于我这儿的网络。由于http://pinterest.com/的JS代码经过压缩,且使用了JQuery,所以找起来特别费劲。其实具体怎么干就很简单,谁都能想到。遍历json数据,创建img标签对象,设置src属性,保存对象。剩下的浏览器就会自己完成。
这方面对象保存在哪里呢?cookie里,还是服务器里的历史文件?此外jquery如何多线程获取图片长和宽?
你是说服务器返回的image链接的数据吗?不用保存呀。收到ajax请求后解析返回数据就完了
另外,浏览器加载外部资源都是异步。也就是说,不管是不是用的JQuery,都是异步加载的,相互不会影响。和老大写的php端的差不多。