最近遇上比较郁闷的事...
在用流的方式获取html页面的时候,只能够获取到页面的部分源代码,并不完整.
原先是采用用HttpURLConnection + URL 去做的...附上主要代码...HttpURLConnection jconn = null;
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
URL url = new URL(strUrl);
jconn = (HttpURLConnection) url.openConnection();
jconn.setDoOutput(true);
jconn.setDoInput(true);
jconn.connect();
InputStream in = jconn.getInputStream();
byte[] buf = new byte[4096];
int bytesRead = 0;
while ((bytesRead = in.read(buf)) != -1) {
byteArrayOutputStream.write(buf, 0, bytesRead);
}
String strRead = new String(byteArrayOutputStream.toByteArray(), "GBK");然后转用HttpClient + GetMethod ...同样附上主要代码... HttpClient client = new HttpClient();
StringBuffer sb = new StringBuffer();
GetMethod getMethod = new GetMethod(strUrl);
int statusCode;
statusCode = client.executeMethod(getMethod);
if (statusCode == HttpStatus.SC_OK) {
BufferedReader bf = new BufferedReader(new InputStreamReader(getMethod.getResponseBodyAsStream(), coder));
String inputLine = null;
while ((inputLine = bf.readLine()) != null) {
sb.append(inputLine).append("\n");
} bf.close();
但是两种方法也不行,最奇怪的是有些页面也能全部解析出来,但是有些页面呢就只能够解析一部分,并不完整..
不知道大家有没有过这样的经历...郁闷了好几天了...
在用流的方式获取html页面的时候,只能够获取到页面的部分源代码,并不完整.
原先是采用用HttpURLConnection + URL 去做的...附上主要代码...HttpURLConnection jconn = null;
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
URL url = new URL(strUrl);
jconn = (HttpURLConnection) url.openConnection();
jconn.setDoOutput(true);
jconn.setDoInput(true);
jconn.connect();
InputStream in = jconn.getInputStream();
byte[] buf = new byte[4096];
int bytesRead = 0;
while ((bytesRead = in.read(buf)) != -1) {
byteArrayOutputStream.write(buf, 0, bytesRead);
}
String strRead = new String(byteArrayOutputStream.toByteArray(), "GBK");然后转用HttpClient + GetMethod ...同样附上主要代码... HttpClient client = new HttpClient();
StringBuffer sb = new StringBuffer();
GetMethod getMethod = new GetMethod(strUrl);
int statusCode;
statusCode = client.executeMethod(getMethod);
if (statusCode == HttpStatus.SC_OK) {
BufferedReader bf = new BufferedReader(new InputStreamReader(getMethod.getResponseBodyAsStream(), coder));
String inputLine = null;
while ((inputLine = bf.readLine()) != null) {
sb.append(inputLine).append("\n");
} bf.close();
但是两种方法也不行,最奇怪的是有些页面也能全部解析出来,但是有些页面呢就只能够解析一部分,并不完整..
不知道大家有没有过这样的经历...郁闷了好几天了...
http://sd.118100.cn/user/querytoneboxbaseinfo.do?boxType=1&flag=0&feeType=2&canSplit=0&canUpdate=1&orderBy=6像这个链接就可以,但是同一个域名内的,像这个:
http://sd.118100.cn/user/querytoneboxdetailinfo.do?toneBoxID=1940&toneboxCode=810099997261&toneboxName=新歌快递&price=5.00&desc=&spName=IMusic&downTimes=0&type=1&linenumber=5&feetype=2&=toneBoxValidDay2060-08-31就不行了.只能够解析到部分的源代码.
最郁闷是,程序获取到的源代码的最后几行是这样的... else if((searchform.condition.value).length==0)
{
alert("查询内容不能为空!");
searchform.condition.focus();
}
else
{
searchform.condition.value = convertDBFormat(searchform.condition.value);
searchform.submit();
}
}
else
{
searchform.minprice.value = myTrim(searchform.minprice.value);是这样的不完整...不是说有些部分解析不出来...