有一个文本,要求在读入时,逐句分割成二字或三字字符串的数组,并把二字的放在前面,删除重复及去除单字词.举例如下.
文本内容为: 信息设置: 修改个人基本信息. "CSDN邮箱" 新,设密码; 新,设问题答案------ 修改email、 注册条款!啊呀
社区·社区、我的,问题/我得分的问题、我得分的日期? 我参与的问题@我的信誉分^ 我的收,藏夹、 我的短消%
读入上述文本后,分解成的字符串数组为{"社区","我的","问题","藏夹","啊呀","信息设","息设置","修改个","改个人","个人基","基本信","本信息","CSD","SDN","DN邮","N邮箱","设密码","设问题","问题答",------------"我的短","的短消"}
文本内容为: 信息设置: 修改个人基本信息. "CSDN邮箱" 新,设密码; 新,设问题答案------ 修改email、 注册条款!啊呀
社区·社区、我的,问题/我得分的问题、我得分的日期? 我参与的问题@我的信誉分^ 我的收,藏夹、 我的短消%
读入上述文本后,分解成的字符串数组为{"社区","我的","问题","藏夹","啊呀","信息设","息设置","修改个","改个人","个人基","基本信","本信息","CSD","SDN","DN邮","N邮箱","设密码","设问题","问题答",------------"我的短","的短消"}
首先读取文本,逐行分解为独立的字符串.
再进行如下操作
1.当字符串长为1时,也就一个字符的句子,忽略
2.当字符串长为2或3都时,则直接加入字符串数组.
3.当字符串长度>3时,从首字符开始,依次分割成字符串长度为3的字符串,加入字符串数组.直至字符串长度<3时,无法分割为止.
最后,对字符串数组进行整理,删除重复,再按字符串长短进行排序,2字者在前,3字者在后.
public class Test {
static void executeSplit(StringBuffer strSplitStr ,String strFlag){
int nBeginIndex = 0;
int nEndIndex = 3;
while (nEndIndex<strFlag.length()){
strSplitStr.append(strFlag.substring(nBeginIndex , nEndIndex));
strSplitStr.append("^^");
nBeginIndex = nEndIndex;
nEndIndex = nEndIndex+3;
}
if (nBeginIndex<strFlag.length()) {
strSplitStr.append(strFlag.substring(nBeginIndex));
strSplitStr.append("^^");
}
}
public static void main(String[] args) {
try {
int nLineCount = 0;//行数
File file = new File("c:\\language.txt");
BufferedReader in = new BufferedReader(new FileReader(file));
String strLine = "";
StringBuffer strBuffer = new StringBuffer(1000);
while ((strLine = in.readLine()) != null) {
strBuffer.append(strLine);
strBuffer.append("^^");
++nLineCount;
}
String[][] strResult = new String[nLineCount][];
java.util.StringTokenizer token ;
String[] strTemp = (strBuffer.toString()).split("\\^\\^");
String strDotFlag = ",.?:!@;%\"";
StringBuffer strSplitStr = new StringBuffer(1000);
String strFlag = "";
for (int i = 0; i<strTemp.length ; ++i) {
token = new java.util.StringTokenizer(strTemp[i] ,strDotFlag);
strResult[i] = new String[token.countTokens()];
int j =0;
while(token.hasMoreTokens()) {
strResult[i][j] = token.nextToken();
if (strResult[i][j].length()>3) {
strFlag = strResult[i][j];
executeSplit(strSplitStr , strFlag);
strResult[i][j] = null;
}
++j;
}
}
for (int i = 0 ;i <strResult.length ; ++i)
for (int j = 0 ;j <strResult[i].length ; ++j){
if (strResult[i][j]!=null) {
strSplitStr.append(strResult[i][j]);
strSplitStr.append("^^");
}
}
strTemp = strSplitStr.toString().split("\\^\\^");
for (int i = 0; i<strTemp.length ; ++i)
System.out.println(strTemp[i]);
} catch (Exception e) {
e.printStackTrace();
}
}
}