public static void main(String[] args) throws Exception {
    String[] urls = {
        "http://mil.news.sina.com.cn/2012-04-10/0428687123.html",
        "http://mil.news.sina.com.cn/2012-04-12/0731687387.html",
        "http://news.sina.com.cn/c/2012-04-13/044224264609.shtml"
    };
    final Pattern titlePattern = Pattern
        .compile("<h1 id=\"artibodyTitle\".*?>(.*?)</h1>");
    final Pattern wordCountPattern = Pattern.compile("\u515a|\u56fd\u5bb6");
    for (final String url : urls) {
      new Thread() {        public void run() {
          BufferedReader reader = null;
          try {
            reader = new BufferedReader(new InputStreamReader(
                new URL(url).openStream(), "GB2312"));
            String line;
            String title = null;
            int[] count = new int[2];
            while ((line = reader.readLine()) != null) {
              if (title == null) {
                Matcher titleMatcher = titlePattern.matcher(line);
                if (titleMatcher.find()) {
                  title = titleMatcher.group(1);
                }
              }
              Matcher wordCountMatcher = wordCountPattern.matcher(line);
              while (wordCountMatcher.find()) {
                String word = wordCountMatcher.group();
                count[word.length() >> 1]++;
              }
            }
            if (count[0] > count[1]) {
              throw new RuntimeException(
                  String.format("%s[%s] \u515a:%d > \u56fd\u5bb6:%d",
                                title,
                                url,
                                count[0],
                                count[1]));
            }
            System.out.printf("%s[%s] is good!", title, url);
          } catch (IOException ex) {
            ex.printStackTrace();
          } finally {
            if (reader != null) {
              try {
                reader.close();
                reader = null;
              } catch (Exception ex) {
              }
            }
          }
        }
      }.start();
    }
  }

解决方案 »

  1.   

    3个线程一律count[0]远远大于count[1]
      

  2.   

    solution:add following line before Matcher wordCountMatcher = wordCountPattern.matcher(line);
     line=line.replaceAll("\u515a","\u4EBA\u6C11");
      

  3.   

    楼主你是哪个单位的?!你是为\u515a说话的,还是为\u56fd\u5bb6说话的?!
      

  4.   

    貌似有句说明了这点没有\u515a就没有新中国.............
    所以\u515a 铁定大于 \u56fd\u5bb6
      

  5.   

    看看u515a铸就的新中国的冰山一角:
    \u559D\u6C34\u6B7B
    \u8EB2\u732B\u732B
    \u674E\u521A
    \u6BD2\u5976\u7C89
    \u6740\u7AE5
    \u9ED1\u76D1\u72F1
      

  6.   

    \u515a=党
    \u56fd=人
    \u5bb6=民
      

  7.   

    统计这些东西有意义么,一篇文章中有1w个\u515a,但是结尾有于句话,说所有这些都是为了\u56fd\u5bb6。