英文词汇程序有几个问题求助

我是用NetBeans 我程序可执行还有几个问题
我的文档有一些特殊字符1.如何去除文档内特殊字符<>\[]？
还是将英文以外全设分类例外？2.写出到TXT文档内怎么全都多个空白？
A m o n g 1
A N N 1我的需求是写出到TXT文档内
Among 1
ANN 1求助各位高手多谢你
package readfile;import java.io.BufferedReader;
import java.io.File;
import java.io.FileOutputStream;
import java.io.FileReader;
import java.io.IOException;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.Iterator;
import java.util.List;
import java.util.Map;public class WordsCount {
    HashMap<String,Integer> hashMap;
    BufferedReader infile;
    String filename = "src/readfile/test1.txt";
    String string;
    String outpath = "src/readfile/test.txt";    @SuppressWarnings("unchecked")
    public WordsCount() throws IOException{
        infile = new BufferedReader(new FileReader(filename));
        hashMap=new HashMap<String,Integer>();
        while((string = infile.readLine()) !=null) {
            String[] words=string.split(" ");
            for(int i=0;i<words.length;i++){
                if(words[i].trim().equals("")){
                    continue;
                }
                String astr=words[i].trim();
                if(astr.endsWith(".")||astr.endsWith(",")){
                    astr=astr.substring( 0,astr.length());
                }
                if(hashMap.containsKey(words[i])){
                    Integer count=(Integer) hashMap.get(words[i]);
                    count++;
                    hashMap.remove(astr);
                    hashMap.put(astr, count);
                }else{
                    hashMap.put(astr, 1);
                }
            }
        }
        infile.close();

        List<String> arrayList=new ArrayList<String>();
        Iterator<?> iter = hashMap.entrySet().iterator();
        outer:while (iter.hasNext()) {
            Map.Entry entry = (Map.Entry) iter.next();
            String key = (String)entry.getKey();

            char aChar=key.charAt(0);
            for(int i=0;i<arrayList.size();i++){
                if(aChar<arrayList.get(i).charAt(0)){
                    arrayList.add(i,key);
                    continue outer;
                }
            }
            arrayList.add(key);
        }

        StringBuffer outContent=new StringBuffer();
        for(int i=0;i<arrayList.size();i++){
            String key=arrayList.get(i);
            outContent.append(key+hashMap.get(key)+"\r\n");
                        System.out.println(key+hashMap.get(key));
        }

        FileOutputStream outs=new FileOutputStream(new File(outpath));
        outs.write(outContent.toString().getBytes());
        outs.flush();
        outs.close();
    }

    public static void main(String[] args){
        try {
            new WordsCount();
        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
    }
}

解决方案 »

免费领取超大流量手机卡，每月29元包185G流量+100分钟通话, 中国电信官方发货

这是我的文档<papers>
<paper>
<title>A corporatecreditratingmodelusingmulti-classsupportvectormachines
with anordinalpairwisepartitioningapproach</title>
<authors>Kyoung-jae Kim a,HyunchulAhn</authors>
     <journal>Computers & OperationsResearch</journal>
<year>2012</year>
<vol>39</vol>
<pages>1800-1811</pages>
<abstract>
Predicting corporate credit-rating using statistical and artificial intelligence (AI) techniques has received considerable research attention in the literature. In recent years, multi-class support vector machines (MSVMs) have become a very appealing machine-learning approach due to their good performance. Until now, researchers have proposed a variety of techniques for adapting support vector machines (SVMs) to multi-class classification, since SVMs were originally devised for binary classifica- tion. However, most of them have only focused on classifying samples into nominal categories; thus, the unique characteristic of credit-rating – ordinality – seldom has been considered in the proposed approaches. This study proposes a new type of MSVM classifier (named OMSVM) that is designed to extend the binary SVMs by applying an ordinal pairwise partitioning (OPP) strategy. Our model can efficiently and effectively handle multiple ordinal classes. To validate OMSVM, we applied it to a real-world case of bond rating. We compared the results of our model with those of conventional MSVM approaches and other AI techniques including MDA, MLOGIT, CBR, and ANNs. The results showed that our proposed model improves the performance of classification in comparison to other typical multi-class classification techniques and uses fewer computational resources.
</abstract>
<keywords>
Corporate credit rating Support vector machines Multi-class classification Ordinal pairwise partitioning
</keywords>
<content>
1. Introduction Corporate creditratingisaveryimportantfactorintheet of corporatedebt.Informationconcerningcorporateoperationsis often disseminatedtoetparticipantsthroughthechangesin credit ratingsthatarepublishedbyprofessionalratingagencies, such asStandard&Poor’s(S&P)andMoody’sInvestorService. Since theseagenciesgenerallyrequirelargefeesfortheirservices and theperiodicallyprovidedratingssometimesdonotreflectthe default riskofthecompanyatthetime,itmaybeadvantageous for bond-etparticipantstobeabletoclassifycreditratings before theagenciespublishtheratings.Asaresult,itisvery important forthecompanies,especiallyfinancialcompanies,to develop apropermodelofcreditrating [1,68]. Fromatechnicalperspective,thecreditratingconstitutesa typical,multi-class,classificationproblembecausetherating agenciesgenerallyhavetenormorecategoriesofratings.For example,S&P’sratingsrangefromAAAforthehighest-quality bondstoDforthelowest-qualitybonds.Professionalrating agenciesemphasizetheimportanceofanalysts’subjectivejudg- mentsindeterminingcreditratings.However,inpractice,a mathematicalmodelthatusesthefinancialvariablesofcompanies playsanimportantroleindeterminingcreditratings,sinceitis convenienttoapplyandentailslesstimeandcost.Thesefinancial variablesincludetheratiosthatrepresentacompany’sleverage status,liquiditystatus,andprofitabilitystatus [1–4,68,69]. Several statisticalandartificialintelligence(AI)techniques have beenappliedastoolsforfinancialdecisionmakingsuchas stock etforecastingorcreditratingsprediction [1,3,5]. Among them,theartificialneuralnetworkshavebeenwidely used intheareaoffinancebecauseoftheirbroadapplicabilityto many businessproblemsandtheirpreeminentabilitytolearn [6,7]. However,besidestheriskofover-fitting,artificialneural networks alsohavemanydefects,includingdifficultyindeter- mining thevaluesofcontrolparametersandthenumberof processing elementsinthelayer.Supportvectormachines(SVMs) have recentlybecomepopularasasolutiontoproblemsthatare associated withpredictionbecauseoftheirrobustnessandhigh accuracy [8–12,70]. AnSVM’ssolutionmaybegloballyoptimal because SVMsseektominimizestructuralrisk.Conversely,the solutions foundbyartificialneuralnetworkmodelstendtofall into localoptimumbecausetheyseektominimizeempiricalrisk. In addition,noparametersneedtobetunedinSVMs,barringthe upper boundfornon-separablecasesinlinearSVMs.However, SVMs wereoriginallydevisedforbinaryclassification;therefore, they arenotnaturallygearedformulti-classclassifications,which apply tocreditratings [13]. Thus,researchershavetriedtoextend the originalSVMtomulti-classclassification. Hitherto, avarietyoftechniquestoextendstandardSVMsto multi-class SVMs(MSVMs)havebeenproposedintheliterature. These techniquesincludeapproachesthatconstructandcombine several binaryclassifiersaswellasapproachesthatdirectly consider allthedatainasingleoptimizationformulation. However, mostpublishedtechniqueshavefocusedonclassifying samples intonominalcategories [8,14–21]. Eventhoseprior studies thatappliedMSVMstocreditratingsalsousedstandard MSVM modelsthatwerenotdesignedtoreflecttheordinal nature ofthisdomain [1,3,22,23]. Furthermore,mostofthese studies testedatmostafewtypesofMSVM. In thisstudy,weproposeanovelcomputationalapproachfor MSVMs, whichtakesintoaccounttheordinalcharacteristicsfor efficiently andeffectivelyhandlingmultipleordinalclasses;we term theapproach,ordinalmulti-classsupportvectormachine (or OMSVM,inshort).SimilartotraditionalMSVMs,ourmodel basically combinesseveralbinarySVMclassifiers.However,itis different fromthetraditionalapproachessinceitextendsthe binary SVMsusingtheordinalpairwisepartitioning(OPP) approach [24]. Usingthelatterapproach,ourmodelusesfewer classifiers, butneverthelessmaymoreaccuratelypredictclasses because itexploitsadditionalhiddeninformation,namely,the order ofclasses.Tovalidatetheeffectivenessofourmodel,we applied themodeltoareal-worldcaseofbondratinginKorea. We comparedtheresultsofthemodeltothoseoftraditional MSVM approaches.Wealsocomparedtheresultsofthemodelto those oftraditionaltechniquesforcreditratings,suchasmultiple discriminant analysis(MDA),multinomiallogisticregression (MLOGIT), case-basedreasoning(CBR),andartificialneural networks (ANNs) [25–33]. Inaddition,toexaminetheeffectof OPP indepth,weappliedOPPtobothMSVMsandANNs,andwe compared thepredictionresultsthatweregeneratedbythesetwo techniques. The restofthispaperisorganizedasfollows.Thenextsection reviews theliteratureonSVMsandMSVMs,inadditiontostudies on creditratingsthatemployeddatamining.InSection3,our approach forordinalmulti-classclassificationisproposed.Section 4 describesthedataandexperimentsforvalidatingourmodel.In Section 5,theempiricalresultsaresummarizedanddiscussed. The finalsectionpresentstheconclusionsandfutureresearch direction ofthisstudy. 2. Literaturereview In thissection,weintroducethebasicconceptofconventional SVM, andwesummarizethestudiesthathaveattemptedto extend theconventionalSVMtomulti-classclassification.Then, we brieflyreviewthestudiesoncreditratingsthathaveusedthe techniques ofdatamining.Wewillalsodiscussthemajorstudies in theliteraturethathaveadoptedMSVMstoclassifycredit ratings. 2.1. Conventional(binary)SVM The conventionalSVMachievesclassificationbymappingthe input vectorsontoahigh-dimensionalfeaturespaceandbythen constructing alinearmodelthatimplementsnonlinearclass boundaries intheoriginalspace.SVMemploysanalgorithmthat finds aspecialkindoflinearmodel,namely,theoptimalhyper- plane. Theoptimalhyperplanereferstothemaximum-margin hyperplane, whichyieldsthemaximumseparationbetween decision classes.Thus,theoptimalhyperplaneseparatesthe training exampleswiththemaximumdistancefromtheseparat- ing hyperplanetotheclosesttrainingdatasamples.Thosetrain- ing examplesthatareclosesttothemaximum-margin hyperplane arecalledsupportvectors.Allothertrainingexam- ples, otherthanthesupportvectors,areuselessforconstructing the optimalhyperplane.Asaresult,itispossibleforSVMsto effectively performbinaryclassificationwithasmallsizeof training samples [1,13,34,35,70]. For thelinearlyseparablecase,ahyperplane,whichseparates the binarydecisionclassesinthecaseof n attributes, canbe represented asthefollowingequation: where y is theoutcome, xi are theattributevalues(i¼1,y, n), and {wi:i¼0,y, n} arethe nþ1 weightstobelearnedbythe learning algorithm.InEq.(1),theweights{wi:i¼0,y, n} arethe parameters thatdeterminethehyperplane.AsshowninEq.(2), SVMs approximatetheoptimalhyperplane(i.e.,themaximum- margin hyperplane)usingthesupportvectors:
1.如何去除文档内特殊字符<>\[]？
还是将英文以外全设分类例外？
可以用replaceAll替换所谓的特殊字符2.写出到TXT文档内怎么全都多个空白？
A m o n g 1
A N N 1FileReader是带编码方式的字符流，所以你最后输出的时候
outContent.toString().getBytes()//这里的字节编码和你读入的字节编码不一致就会造成这样的效果
LZ可以自己简单的测试，写个文本保存为unicode编码，然后用FileReader读入，然后用你这样的方式输出，自己看看结果就知道了另外，不知道LZ的原文本文件的内容是怎么保存的，如果某个单词没有结束就换行，那么还能保证读入的是完整的单词吗？
LZ保存到List的时候，没必要自己去按首字母插入，最后排序一下就可以了
Collections.sort(List);
多谢你
我修改了红字那3行原码
执行结果文档只要有<相关都变 null次
可能我误解你说的用法可以再帮助解析吗
package readfile;import java.io.BufferedReader;
import java.io.File;
import java.io.FileOutputStream;
import java.io.FileReader;
import java.io.IOException;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.Iterator;
import java.util.List;
import java.util.Map;public class WordsCount {
HashMap<String,Integer> hashMap;
BufferedReader infile;
String filename = "src/readfile/test3.txt";
String string;
String outpath = "src/readfile/test.txt"; @SuppressWarnings("unchecked")
public WordsCount() throws IOException{
infile = new BufferedReader(new FileReader(filename));
hashMap=new HashMap<String,Integer>();
while((string = infile.readLine()) !=null) {
                     String[] words=string.split("\\s+");
for(int i=0;i<words.length;i++){
if(words[i].trim().equals("")){
continue;
}
String astr=words[i].trim();
if(astr.endsWith(".")||astr.endsWith(",")){
astr=astr.substring( 0,astr.length());
}
if(hashMap.containsKey(words[i])){
Integer count=(Integer) hashMap.get(words[i]);
count++;
hashMap.remove(astr);
hashMap.put(astr, count);
}else{
hashMap.put(astr, 1);
}
}
}
infile.close();

List<String> arrayList=new ArrayList<String>();
Iterator<?> iter = hashMap.entrySet().iterator();
outer:while (iter.hasNext()) {
Map.Entry entry = (Map.Entry) iter.next();
String key = (String)entry.getKey();
key=key.replaceAll("(?i)","");
                              key=key.replaceAll("<","");
                               key=key.replaceAll(",","");
char aChar=key.charAt(0);
for(int i=0;i<arrayList.size();i++){
if(aChar<arrayList.get(i).charAt(0)){
arrayList.add(i,key);
continue outer;
}
}
arrayList.add(key);
}

StringBuffer outContent=new StringBuffer();
for(int i=0;i<arrayList.size();i++){
String key=arrayList.get(i);

outContent.append(key+" "+hashMap.get(key)+"次"+"\r\n");
                        System.out.println(key+hashMap.get(key));
}

FileOutputStream outs=new FileOutputStream(new File(outpath));
outs.write(outContent.toString().getBytes());
outs.flush();
outs.close();
}

public static void main(String[] args){
try {
new WordsCount();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}执行后文档
#5’.Whenthemethodoffusionwasfixedtheprediction null次
#1to#5. 1次
& 1次
&MathematicswithApplications2009;57:1908–14. 1次
(six) null次
(67.21%)approaches. 1次
(named 1次
(6)themethodofCrammerandSinger.Inthecaseof 1次
(OPP) 1次
(64.34%)performedtheworst.Theorderingof 1次
(_24_1_1 1次
(MSVMs) 1次
(MLOGIT);(3)case-basedreasoning(CBR);(4)artificial 1次
(Pleasereferto 1次
(2) 1次
(OPP)whichconsiderstheorderofclasseswhilecombiningseveral null次
(p 1次
(OAA)andECOCatthe1%levelofsignificance null次
(65.66%) 1次
(_1þ1) null次
(67.21%) 1次
(67.29%) 1次
(also 3次
(MDA) 1次
(OPP)approachasatoolforupgradingconventionalmulti- 1次
(67.36%467.13% 1次
(eitherOne-Against-The-NextorOne-Against-Followers). 1次
(64.89%)butunderperformedtheDAGSVM(67.29%)and 1次
(_1_1). null次
(MLOGIT) null次
(SVMs) 1次
(negative); 1次
(1)multiplediscriminantanalysis(MDA);(2)multinomiallogistic 1次
(2)interpretation.InthepreparationphaseOMSVM null次
(where 1次
(23);Model3forthepairofclasses(34). null次
(AI) 1次
(þ1_1) null次
(or 1次
(ANNs) 1次
(7).BysolvingEq.(7)wecanfindthehyperplanethat null次
(3)DAGSVM;(4)ECOC;(5)themethodofWestonand 1次
. 6次
. 6次
.m null次
/references> null次
/keywords> null次
/content> null次
/paper> null次
/papers> null次
/abstract> null次
/http://www.csie.ntu.edu.tw/_cjlin/libsvm/S. 1次
/http://www.csie.ntu. 1次
0 1次
1 1次
1985;20:237–62. 1次
12 null次
1999;16:240–7. 1次
1995. 1次
1. 2次
10 1次
1 1次
180:1–28. 1次
1295companiesfromthemanufacturingindustryinKorea.In 1次
1995;13:10–3. 1次
1995;8:32–8. 1次
1980;9:44–51. 1次
2thenitisdeemed‘class2’.OtherwiseModel3applies.In null次
2 4次
2008.p.319–28. 1次
2k_1_1. 1次
2004;2972:272–81. 1次
2.2. 1次
2.1. 1次
2.3. 1次
2. 1次
2 4次
2) 1次
263–86. 1次
231–40. 1次
2007;71:400–5. 1次
2005;32:2513–22. 1次
2006;4234:420–9. 1次
2009;3:393–407. 1次
2004;10:91–109. 1次
2009;195:924–41. 1次
2003;55:307–19. 1次
2007;1:255–68. 1次
3 2次
3 2次
3. 1次
3thetestdatumisfinallyclassifiedaseither‘class3’or null次
3.) 1次
355–89. 1次
34–41. 1次
4 3次
4.1. 1次
4.2. 1次
4. 2次
427–35. 1次
4’.Usingthesamereasoningthebackwardmethodstarts null次
5 1次
5–7. 1次
5 1次
5. 1次
547–53. 1次
5theempiricalresultsaresummarizedanddiscussed. null次
64.89%and67.05%.Inourexperimentthemethodof null次
6 null次
6. 1次
7 1次
7) 1次
838–46. 1次
8 null次
89–112. 1次
9 1次
9 1次
: 2次
Among 2次
Analysis1999;2:117–31. 1次
for example
String key = "<abcd> [1234] \\test";
key=key.replaceAll("<|>|\\\\|\\[|\\]","");
System.out.println(key);
我修改了这行可以把其他的字符全当分割
String[]  words=string.split("[0-9\\.\\@_\\-~#\\,\\:\\;\\(\\)\\[\\]\\%\\&\\’\\‘\\—\\{\\}\\</\\>\\=\\s]+");我有另一个问题我写入新的txt档内英文大小写有分开
如何把英文一致化？
例如
In 1次
in 1次
希望成为
In 2次
那就修改你统计的地方,忽略大小写class UWord implements Comparator<? extends UWord> {
    private String word;
    public UWord(String word) {this.word = word;}
    public String toString() {return word;}    public boolean equals(Object o) {
        if (o==null || !(o instanceof UWord)) return false;
        return word.equalsIgnoreCase(((UWord)o).word); //忽略大小写
    }
    public int hashCode() {
        return word.toLowerCase().hashCode(); //统一为小写或大写
    }
}//然后
HashMap<UWord,Integer> hashMap;
...
                if(astr.endsWith(".")||astr.endsWith(",")){
                    astr=astr.substring( 0,astr.length());
                }
                UWord w = new UWord(astr);
                //if(hashMap.containsKey(words[i])){
                if(hashMap.containsKey(w)){
                    //Integer count=(Integer) hashMap.get(words[i]);
                    //count++;
                    //hashMap.remove(astr);
                    //hashMap.put(astr, count);
                    hashMap.put(w, hashMap.get(w)+1);
                }else{
                    //hashMap.put(astr, 1);
                    hashMap.put(w, 1);
                }
                ...
我整理进去只是有几个变数没有
我是不是那里用错吗？想求助你全部的原码吗？
多谢你
public class NewClass {
HashMap<String,Integer> hashMap;
BufferedReader infile;
String filename = "src/readfile/test3.txt";
String string;
String outpath = "src/readfile/test.txt";
class UWord implements Comparator<? extends UWord> {
    private String word;
    public UWord(String word) {this.word = word;}
    public String toString() {return word;}    public boolean equals(Object o) {
        if (o==null || !(o instanceof UWord)) return false;
        return word.equalsIgnoreCase(((UWord)o).word); //
    }
    public int hashCode() {
        return word.toLowerCase().hashCode(); //
    }

}
@SuppressWarnings("unchecked")
public NewClass() throws IOException{
infile = new BufferedReader(new FileReader(filename));
hashMap=new HashMap<String,Integer>();
               while((string = infile.readLine()) !=null) {
                   String[]  words=string.split("[0-9\\.\\@_\\-~#\\,\\:\\;\\(\\)\\[\\]\\%\\&\\’\\‘\\—\\{\\}\\</\\>\\=\\s]+");
                   for(int i=0;i<words.length;i++){
                        HashMap<UWord,Integer> hashMap;
    if(astr.endsWith(".")||astr.endsWith(",")){
                    astr=astr.substring( 0,astr.length());
                }
                UWord w = new UWord(astr);
                //if(hashMap.containsKey(words[i])){
                if(hashMap.containsKey(w)){
                    //Integer count=(Integer) hashMap.get(words[i]);
                    //count++;
                    //hashMap.remove(astr);
                    //hashMap.put(astr, count);
                    hashMap.put(w, hashMap.get(w)+1);
                }else{
                    //hashMap.put(astr, 1);
                    hashMap.put(w, 1);
                }
}
}
infile.close();

List<String> arrayList=new ArrayList<String>();
Iterator<?> iter = hashMap.entrySet().iterator();
outer:while (iter.hasNext()) {
Map.Entry entry = (Map.Entry) iter.next();
String key = (String)entry.getKey();
//key=key.replaceAll("(?i)","");
char aChar=key.charAt(0);
for(int i=0;i<arrayList.size();i++){
if(aChar<arrayList.get(i).charAt(0)){
arrayList.add(i,key);
continue outer;
}
}
arrayList.add(key);
}

StringBuffer outContent=new StringBuffer();
for(int i=0;i<arrayList.size();i++){
String key=arrayList.get(i);
                         outContent.append(key+" "+hashMap.get(key)+"次"+"\r\n");
                        System.out.println(key+hashMap.get(key));
}

FileOutputStream outs=new FileOutputStream(new File(outpath));
outs.write(outContent.toString().getBytes());
outs.flush();
outs.close();
}

public static void main(String[] args){
try {
new WordsCount();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}