如何查找UTF-8字符串中的字符串？(字符编码问题)

我用apache httpclient 4.1.1抓取网页，用String的indexof方法搜索其中是否含有感兴趣的关键字，搜索GBK、GB2312编码网页时正常，遇到UFT-8编码网页就无法搜索，抓取下来的中文内容打印出来也是无法辨认。肯定是编码问题了，不知该怎么解决。搜索了好长时间，试了各种转换编码方法，但都不能把抓取下来的中文内容正常打印出来，搜索也都是-1.

解决方案 »

免费领取超大流量手机卡，每月29元包185G流量+100分钟通话, 中国电信官方发货

主要是发送放编码和接收方解码的问题
人家网页编码相当于发送方是utf-8你的流接受的时候相当于接受方必须也要是utf-8这样才行。
GBK/GB2312编码，得GBK/GB2312解码了；
UTF-8编码，得UTF-8解码了；所以得先探测出网页编码，然后用相应的编码解码。
不知道apache httpclient 4.1.1能否获取下面信息：
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
或者
使用cpDetector先检测出字符集编码，再用此字符编码来解码。
按照LZ的思路，估计怎么着也转不回去。应该在你的工具上找问题，它抓不出UTF-8编码的数据来，是它的问题。而不需要你自己去转换
网页是UTF-8编码，这没问题，抓到的网页里自然是包括<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>的。
现在就想知道如何处理这UTF-8字符串
做个字符转换不就行了，别人那边来的是什么编码，你一律先转化成utf-8编码，再完成查找工作。
但是对uft-8字符串用indexof函数无法查找中文字符串
那你一律转化成gbk不就行.........
做字符串编码转换。
  例如将ISO8859-1格式转换为UTF-8
String old = "XXX" //假设采用的ISO8859-1编码格式
String New = new String(old.getbyte("ISO8859-1"),"UTF-8");//New的编码格式为UTF-8
我是想把UFT-8转为GBK呀。搜了一下，似乎只有GBK转UTF-8的
除非你有个UTF-8和GBK的对应关系映射表，你怎样把UTF-8编码的内容，用GBK来解码? 或者是GBK到UTF-8?你查询下apache httpclient 4.1.1的API，可否有设置charset的方法，在你抓取网页前，设置charset；
这样让apache httpclient 4.1.1用指定的charset来解码网页；再来进行你想要的操作。
看楼主也很纠结的；当学习，下载了apache httpclient 4.1.3，给个示例(使用探测工具探测编码失败，就不写了):import java.io.BufferedReader;
import java.io.InputStreamReader;import org.apache.http.HttpResponse;
import org.apache.http.HttpStatus;
import org.apache.http.client.HttpClient;
import org.apache.http.client.methods.HttpPost;
import org.apache.http.impl.client.DefaultHttpClient;public class EncodedPostTest {
public static void main(String[] args) throws Exception {
HttpClient httpclient = new DefaultHttpClient();
BufferedReader bufReader = null;
String charset = "";
try {
HttpPost httppost = new HttpPost(
"http://localhost:8080/TestJEEProject/EncodingServlet");
HttpResponse response = httpclient.execute(httppost);
if (response.getStatusLine().getStatusCode() == HttpStatus.SC_OK) {
if (response.getEntity().getContentEncoding() != null) {
charset = response.getEntity().getContentEncoding().getValue();
}else if(response.getEntity().getContentType() != null){
String contentType = response.getEntity().getContentType().getValue().toLowerCase().replaceAll("\\s*", "");
charset = contentType.substring(contentType.indexOf("charset=") + "charset=".length());
}else{
// //TODO: 使用默认字符编码
charset = "gbk";
}

System.out.println("Charset : " + charset);

bufReader = new BufferedReader(new InputStreamReader(response.getEntity().getContent(), charset));
String strValue = bufReader.readLine();
while(strValue != null){
if(strValue.indexOf("编码") != -1){
System.out.println(strValue);
}
strValue = bufReader.readLine();
}

} else {
System.out.println("Unexpected failure: "
+ response.getStatusLine().toString());
}
} finally {
httpclient.getConnectionManager().shutdown();
if(bufReader != null){
bufReader.close();
}
}
}
}
Servlet:import java.io.IOException;
import java.io.PrintWriter;import javax.servlet.ServletException;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;/**
* Servlet implementation class EncodingServlet
*/
public class EncodingServlet extends HttpServlet {
private static final long serialVersionUID = 1L;

    /**
     * @see HttpServlet#HttpServlet()
     */
    public EncodingServlet() {
        super();
    } /**
* @see HttpServlet#doGet(HttpServletRequest request, HttpServletResponse response)
*/
    public void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
doPost(request, response);
} /**
* @see HttpServlet#doPost(HttpServletRequest request, HttpServletResponse response)
*/
public void doPost(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
//set Charset = GBK
// response.setContentType("text/html;charset=GBK");
// response.setHeader("Content-Encoding", "GBK");

//set Charset = UTF-8
response.setContentType("text/html;charset=UTF-8");
// response.setHeader("Content-Encoding", "UTF-8");

PrintWriter out = response.getWriter();
out.print("如何查找UTF-8字符串中的字符串？(字符编码问题)\n");
out.print("我用apache httpclient 4.1.1抓取网页，抓取下来的中文内容打印出来也是无法辨认。\n");
out.print("用String的indexof方法搜索其中是否含有感兴趣的关键字，\n");
out.print("搜索GBK、GB2312编码网页时正常，遇到UFT-8编码网页就无法搜索，\n");
out.print("肯定是编码问题了，不知该怎么解决。搜索了好长时间，试了各种转换编码方法，\n");
out.print("但都不能把抓取下来的中文内容正常打印出来，搜索也都是-1.");

}
}
原来是在InputStreamReader方法中指定编码呀，解决了！多谢指教！