分词工具的字典自动学习功能如何实现 - 调试易

分词工具的字典自动学习功能如何实现

分词词库

解决方案 »

免费领取超大流量手机卡，每月29元包185G流量+100分钟通话, 中国电信官方发货

import java.io.*;
import java.util.*;public class Tokenizer {
private static Tokenizer tok = new Tokenizer();
public String read(String path){
StringBuilder stb = new StringBuilder();
try{
BufferedReader buf = new BufferedReader(new FileReader(new File(path)));
String sline;
while((sline = buf.readLine()) != null){
stb.append(sline);
}
buf.close();
}
catch(IOException e){
e.printStackTrace();
}
return stb.toString();
}

public static void main(String[] args){
String direcpath = "字典的地址";
String textpath = "文本的地址";
String[] text = tok.read(textpath).split("\\W+");
String[] dire = tok.read(direcpath).split("\\W+");
Set<String> set1 = new HashSet<String>();    //文本
Set<String> set2 = new HashSet<String>();    //字典
Collections.addAll(set1, text);
Collections.addAll(set2, dire);
set2.addAll(set1);
}
}不知这样是否可以。分别读取字典和文本，将字典和文本的单词分别放到两个set中，通过比较两个set，将文本中存在而字典中不存在的词放到字典中。