请教关于读TXT档抓字串计算次数写出TXT档

我是要做把test1.txt档读入再分解出英文字词统计出现次数再写入档test.txt内
例如：I in apple
I      1
in     1
apple  1
但是我的有错误执行完变成
I  1237
n  2154
a  6565 我把原码跟文档附上
请教是否那行有错误吗？该如何修改？
import java.io.*;
import java.util.*;public class java2
{
  public static void main (String[] args) throws IOException
  {
    String               fileName   = "test1.txt";
    BufferedReader       bufReader  = new BufferedReader (new FileReader (fileName));
    StreamTokenizer      stToken    = new StreamTokenizer (bufReader);
    Map<String, Integer> mapWords   = new HashMap<String, Integer> ();
    Set                  mapSet     = null;
    Map.Entry[]          mapEntries = null;
    Integer              numWords   = null;
    int                  tokenType  = 0;
    int                  i;
    // 小寫(lower case)模式
   stToken.lowerCaseMode (true);
    stToken.ordinaryChars (0, 'A' - 1);
    stToken.ordinaryChars ('Z' + 1, 'a' - 1);
    stToken.ordinaryChars ('z' + 1, 255);
        stToken.eolIsSignificant (true);        while (bufReader.ready ())
    {
       tokenType = stToken.nextToken ();

      switch (tokenType)
      {
      case StreamTokenizer.TT_WORD:
        {

          numWords = mapWords.get (stToken.sval);

          if (numWords == null)
          {

            numWords = new Integer (1);
          }
          else
          {

            numWords++;
          }


          mapWords.put (stToken.sval, numWords);
        }
        break;

      default:
        break;
      }
    }


    mapSet     = mapWords.entrySet ();

    mapEntries = (Map.Entry[]) mapSet.toArray (new Map.Entry[mapSet.size ()]);

    Arrays.sort (mapEntries, new Comparator ()
                              {
                                public int compare (Object o1, Object o2)
                                {
                                 Object v1 = ((Map.Entry)o1).getValue ();
                                 Object v2 = ((Map.Entry)o2).getValue ();
                                 return ((Comparable)v2).compareTo (v1);
                                }
                              }
                );

    // test.txt        BufferedWriter bw = new BufferedWriter
        (new FileWriter("test.txt"));

    for (i = 0; i < mapEntries.length; i++)
    {
         bw.write(  mapEntries[i].getKey () + " " + mapEntries[i].getValue ()+"\r\n" );
    }                        bw.close();
  }
}test1.txt档内容<papers>
<paper>
<title>A corporatecreditratingmodelusingmulti-classsupportvectormachines
with anordinalpairwisepartitioningapproach</title>
<authors>Kyoung-jae Kim a,HyunchulAhn</authors>
     <journal>Computers & OperationsResearch</journal>
<year>2012</year>
<vol>39</vol>
<pages>1800-1811</pages>
<abstract>
Predicting corporate credit-rating using statistical and artificial intelligence (AI) techniques has received considerable research attention in the literature. In recent years, multi-class support vector machines (MSVMs) have become a very appealing machine-learning approach due to their good performance. Until now, researchers have proposed a variety of techniques for adapting support vector machines (SVMs) to multi-class classification, since SVMs were originally devised for binary classifica- tion. However, most of them have only focused on classifying samples into nominal categories; thus, the unique characteristic of credit-rating – ordinality – seldom has been considered in the proposed approaches. This study proposes a new type of MSVM classifier (named OMSVM) that is designed to extend the binary SVMs by applying an ordinal pairwise partitioning (OPP) strategy. Our model can efficiently and effectively handle multiple ordinal classes. To validate OMSVM, we applied it to a real-world case of bond rating. We compared the results of our model with those of conventional MSVM approaches and other AI techniques including MDA, MLOGIT, CBR, and ANNs. The results showed that our proposed model improves the performance of classification in comparison to other typical multi-class classification techniques and uses fewer computational resources.
</abstract>
<keywords>
Corporate credit rating Support vector machines Multi-class classification Ordinal pairwise partitioning
</keywords>
<content>

解决方案 »

免费领取超大流量手机卡，每月29元包185G流量+100分钟通话, 中国电信官方发货

of 8
the 7
a 6
to 6
and 5
multi 4
techniques 4
classification 4
class 4
in 4
rating 4
support 3
ordinal 3
machines 3
credit 3
model 3
have 3
vector 3
svms 3
proposed 3
our 3
for 2
authors 2
year 2
approaches 2
we 2
binary 2
pairwise 2
– 2
title 2
pages 2
has 2
partitioning 2
results 2
omsvm 2
keywords 2
that 2
corporate 2
other 2
journal 2
vol 2
performance 2
abstract 2
with 2
msvm 2
ai 2
predicting 1
using 1
handle 1
opp 1
improves 1
kim 1
received 1
validate 1
until 1
showed 1
compared 1
appealing 1
only 1
on 1
intelligence 1
ordinality 1
papers 1
content 1
classifica 1
now 1
them 1
years 1
artificial 1
classifying 1
unique 1
most 1
anordinalpairwisepartitioningapproach 1
statistical 1
computers 1
typical 1
designed 1
mlogit 1
classes 1
applying 1
learning 1
research 1
new 1
including 1
world 1
resources 1
jae 1
computational 1
considerable 1
become 1
by 1
study 1
good 1
since 1
type 1
nominal 1
hyunchulahn 1
been 1
were 1
focused 1
seldom 1
mda 1
effectively 1
corporatecreditratingmodelusingmulti 1
however 1
researchers 1
literature 1
efficiently 1
machine 1
comparison 1
samples 1
conventional 1
fewer 1
due 1
case 1
this 1
named 1
can 1
paper 1
recent 1
considered 1
bond 1
real 1
tion 1
classsupportvectormachines 1
operationsresearch 1
kyoung 1
multiple 1
attention 1
those 1
is 1
it 1
cbr 1
thus 1
extend 1
into 1
strategy 1
anns 1
proposes 1
adapting 1
variety 1
msvms 1
originally 1
devised 1
applied 1
an 1
classifier 1
approach 1
uses 1
categories 1
their 1
characteristic 1
very 1
不过我执行完
e 150
a 134
i 128
t 116
s 114
o 112
r 108
n 95
c 78
l 66
p 61
d 49
h 48
m 45
u 41
g 27
f 26
v 26
y 23
w 16
b 10
q 5
k 4
j 3
? 1
x 1
这是什么情况？猜测一下：1文件读取错了，2jdk不一样？我是1.6的3人品问题
我用JPK用1.7我去换1.6试试
我是用NetBeans  我改了程序还有几个问题
1.如何去除文档内特殊字符<>\[]？
还是将英文以外全设分类中翻译例外？2.写出到TXT文档内怎么全都多个空白？
A m o n g 1
A N N 1package readfile;import java.io.BufferedReader;
import java.io.File;
import java.io.FileOutputStream;
import java.io.FileReader;
import java.io.IOException;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.Iterator;
import java.util.List;
import java.util.Map;public class WordsCount {
HashMap<String,Integer> hashMap;
BufferedReader infile;
String filename = "src/readfile/test1.txt";
String string;
String outpath = "src/readfile/test.txt"; @SuppressWarnings("unchecked")
public WordsCount() throws IOException{
infile = new BufferedReader(new FileReader(filename));
hashMap=new HashMap<String,Integer>();
while((string = infile.readLine()) !=null) {
String[] words=string.split(" ");
for(int i=0;i<words.length;i++){
if(words[i].trim().equals("")){
continue;
}
String astr=words[i].trim();
if(astr.endsWith(".")||astr.endsWith(",")){
astr=astr.substring( 0,astr.length());
}
if(hashMap.containsKey(words[i])){
Integer count=(Integer) hashMap.get(words[i]);
count++;
hashMap.remove(astr);
hashMap.put(astr, count);
}else{
hashMap.put(astr, 1);
}
}
}
infile.close();

List<String> arrayList=new ArrayList<String>();
Iterator<?> iter = hashMap.entrySet().iterator();
outer:while (iter.hasNext()) {
Map.Entry entry = (Map.Entry) iter.next();
String key = (String)entry.getKey();

char aChar=key.charAt(0);
for(int i=0;i<arrayList.size();i++){
if(aChar<arrayList.get(i).charAt(0)){
arrayList.add(i,key);
continue outer;
}
}
arrayList.add(key);
}

StringBuffer outContent=new StringBuffer();
for(int i=0;i<arrayList.size();i++){
String key=arrayList.get(i);
outContent.append(key+hashMap.get(key)+" 次"+"\r\n");
                        System.out.println(key+hashMap.get(key));
}

FileOutputStream outs=new FileOutputStream(new File(outpath));
outs.write(outContent.toString().getBytes());
outs.flush();
outs.close();
}

public static void main(String[] args){
try {
new WordsCount();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}

请教关于读TXT档 抓字串 计算次数 写出TXT档

解决方案 »

请教关于读TXT档抓字串计算次数写出TXT档