有没有函数从html文本中获取纯文本内容？？？

可以用java文件流来读取。
然后判断html的标志关键字：如果是正文则读取。
html的标志关键字，自己去查吧(如：<html></html><title>...)

解决方案 »

免费领取超大流量手机卡，每月29元包185G流量+100分钟通话, 中国电信官方发货

有没有这方面的函数呢？我不想自己写函数来分析html内容，如果自己写函数恐怕很复杂，因为html那么多标签，我必须一个个分析，太麻烦了
我建议你看看：
    import java.io.*;
    import java.nio.*;
    import java.nio.channels.*;

    public class worldheart {
        public static void main(String args[])
                                   throws IOException {

            // check command-line arguments

            if (args.length != 2) {
                System.err.println("missing filenames");
                System.exit(1);
            }

            // get channels

            FileInputStream fis =
                           new FileInputStream(args[0]);
            FileOutputStream fos =
                          new FileOutputStream(args[1]);
            FileChannel fcin = fis.getChannel();
            FileChannel fcout = fos.getChannel();

            // allocate buffer

            ByteBuffer buf =
                        ByteBuffer.allocateDirect(8192);

            // do copy

            long size = fcin.size();
            long n = 0;
            while (n < size) {
                buf.clear();
                if (fcin.read(buf) < 0) {
                    break;
                }
                buf.flip();
                n += fcout.write(buf);
            }

            // finish up

            fcin.close();
            fcout.close();
            fis.close();
            fos.close();
        }
    }
看来你所说的正文应该是仅仅文字，不含格式的，那样你只要取出所有<>之外的内容就可以了，不要一个一个分析，反正不需要格式。
看看是不是这个！！！！！！！！！
import java.net.*;
import java.io.*;public class GetHTML {
public static void main(String args[]){
if (args.length < 1){
System.out.println("USAGE: java GetHTML httpaddress");
System.exit(1);
}
String sURLAddress = new String(args[0]);
URL    url = null;
try{
   url = new URL(sURLAddress);
}catch(MalformedURLException e){
   System.err.println(e.toString());
                   System.exit(1);
}
try{
                   InputStream ins = url.openStream();
   BufferedReader breader = new BufferedReader(new InputStreamReader(ins));
                   String info = breader.readLine();
                   while(info != null){
                        System.out.println(info);
                        info  = breader.readLine();
   }
}
                catch(IOException e){
   System.err.println(e.toString());
                   System.exit(1);
}
}
}
tomxutomxu(shprog) ：你理解的不错，不过你所说的方法行不通，因为文本正文重也可能包括<>
比喻说:
<html>
<body>
<font size="5">
<hello world>
</font >
</body>
</html>
我想获取其中的<hello world>==============
我想从邮件获取邮件正文，之后分析正文，如果邮件正文是纯文本就很好办，直接分析即可，大有时候用户通过outlook或者web方式发送的很可能是html格式的，所以我必须从html格式中提取有用的纯文本
你用datainputstream，dataoutputstream，
对指定网址的网页进行数据流的访问，思路是肯定可以的。
但是我的机子不行，不能实现。
祝你成功！
不建议你自己处理，最好找个合适的Html解析器
如果需要自己处理，也请使用正则表达式来匹配
其实最简单的处理，就是去除 < > 之中的字符串
但这样简单处理，难免会留下一些你不要的垃圾，扩展一下，应该没问题的
楼上大哥的办法不错啊，
但是又有好多新情况要考虑罗：
  for example:
  < 6>3>
  <我今天买了本<<thinking in java>>!>
               ~~                ~~
简单的除去<>之中的内容肯定是不行的，因为规定的纯文本格式中也包括<>，而且我还需要通过<>来识别各个子段，使用html解析器可能可以解决，我查查资料，先谢了，如果谁了解这方面的内容请多多指教
我前几天给人找了一个，很简单，但应该能满足你的需求
http://www.csdn.net/Expert/TopicView1.asp?id=698524
看看这个http://sourceforge.net/projects/jtidy
你想要HTML parser吗，我这里有个，给你EMAIL我，我寄信给你。
但我没有测试过。