Java中如何对html tag进行escape - 调试易

Java中如何对html tag进行escape

用户提交带格式的一段html代码，一方面需要将html代码存储；另外，在输出内容简介时希望将除控制文本格式的标签外的内容提取出来。如何去做？在java中实现，请问有相关API吗？谢谢。

解决方案 »

免费领取超大流量手机卡，每月29元包185G流量+100分钟通话, 中国电信官方发货

找了一下，希望有所帮助。
from: http://forum.java.sun.com/thread.jspa?threadID=778434&messageID=4429791org.htmlparser.parserapplications.StringExtractor或者
Or, using classes from the JDK:import java.io.*;
import java.net.*;
import javax.swing.text.*;
import javax.swing.text.html.*;

class GetHTMLText
{
public static void main(String[] args)
throws Exception
{
EditorKit kit = new HTMLEditorKit();
Document doc = kit.createDefaultDocument();

// The Document class does not yet handle charset's properly.
doc.putProperty("IgnoreCharsetDirective", Boolean.TRUE);

// Create a reader on the HTML content.

Reader rd = getReader(args[0]);

// Parse the HTML.

kit.read(rd, doc, 0);

//  The HTML text is now stored in the document

System.out.println( doc.getText(0, doc.getLength()) );
}

// Returns a reader on the HTML data. If 'uri' begins
// with "http:", it's treated as a URL; otherwise,
// it's assumed to be a local filename.

static Reader getReader(String uri)
throws IOException
{
// Retrieve from Internet.
if (uri.startsWith("http:"))
{
URLConnection conn = new URL(uri).openConnection();
return new InputStreamReader(conn.getInputStream());
}
// Retrieve from file.
else
{
return new FileReader(uri);
}
}
}