重金悬赏100分求程序代码（文本文件中的单词统计）

我自己写的这个有点错误请给调试调试，共享一下我的：
  //Demo3.java
  import   java.io.*;
  import   java.util.*;
  class   StringInt
  {
  public   String[]   sg;
  public   int[]   it;
  StringInt()
  {
  }
          StringInt(int   n)
  {
            sg=new   String[n];
            it=new   int[n];
          }
  };

  class   Demo3{
  public   static   void   main(String   args[]){

      try{

        File     file=new     File("b.txt");
                        BufferedReader     in=new     BufferedReader(new     FileReader(file));
        BufferedWriter     out=new   BufferedWriter   (new   FileWriter   ("out.txt"));
                        String     s="";   //定义的字符串为空，准备存放读取的文本
                        StringBuffer     str=new     StringBuffer();


                        while((s=in.readLine())!=null){
            str.append(s);//读取的文本放到字符串s中去
                          }
        String   str1=str.toString();//StringTokenizer中只能够用string不能用StringBuffer
        StringTokenizer   st=new   StringTokenizer(str1,"   ");
        int   n=st.countTokens();//求文本的长度

        StringInt   SI=new   StringInt(n);//新建一个StringInt对象SI

                      SI.sg=str1.split("   ");//将读取的文本放入SI中的String数组中

    for   (int   i=0;i<n   ;i++   )
    {
    SI.it[i]=1;       //将每个单词出现的次数赋初值为1
    }

      for(int   i=0;i<n-1;i++)
    for   (int   j=i+1;j<n;j++   )
    {
      if((SI.sg[i].equals(SI.sg[j]))&&(SI.it[j]!=0))
        {
    SI.it[i]++;   //相同的单词个数合并，只在一个单词处记录，其它的都赋值为0
            SI.it[j]--;
    }
            }


        //用冒泡法进行排序
          for(int   i=0;i<n;i++)
  for(int   j=n-1;j>i;j--)
                {
    if(SI.it[j]>SI.it[j-1])
    {   int   temp=SI.it[j];
                  SI.it[j]=SI.it[j-1];
                  SI.it[j-1]=temp;

                String   Stemp=SI.sg[j];
                  SI.sg[j]=SI.sg[j-1];
                  SI.sg[j-1]=Stemp;
      }
        }


          for   (int   i=0;i<n   ;i++)
      {
    if(SI.it[i]!=0){

            System.out.println(SI.sg[i]+":"+SI.it[i]);

    }
    }
  in.close();
  out.close();

              }catch(Exception     e){
                        System.out.println(e.toString());
                }

          }
        }

我测试了一下，越界了
PS：你的try catch模块写的太长了调试报错也麻烦

有一个思路,不知道行不行,既然单词间由空格分开,那么用正则将所有空格替换为单空格(防止两个单词间出现多空格情况,统一格式),然后按照空格分组,得到所有单词的数组,依次取出,判断单词数量可以遍历数组(不知道有没有直接根据单词在数组中定位的函数,比如indexof之类);也可以将数组放在一个list里面,然后匹配,匹配到的计数并踢出list;或者再用正则按照这个单词分组,根据得到结果数组的长度判断单词数量,要考虑到这个单词在文件首和尾的情况

public class Word implements Comparable<Word>{
    private String word;
    private int occurrence;
    public Word(String w) {
        assert w!=null;
        word=w;
    }
    public String getWord() {
        return this.word;
    }
    public int getOccurrence() {
        return this.occurrence;
    }
    public void increase(){
        occurrence++;
    }
    public boolean equals(Object another){
        if(another==null)
            return false;
        if(another instanceof Word){
            Word anotherWord=(Word)another;
            return anotherWord.getWord().equals(word);
        }else
            return false;
    }
    public int hashCode(){
        return word.hashCode();
    }
    public int compareTo(Word w) {
        if(occurrence < w.getOccurrence())
            return -1;
        else if(occurrence == w.getOccurrence())
            return 0;
        else
            return 1;
    }
}
///////////////////////////////////////////////////////////////////////////////////
import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Collections;
import java.util.StringTokenizer;public class WordParser {
    private File toBeParsed;
    public WordParser(File f) {
        assert f.exists()&&f.isFile();
        toBeParsed=f;
    }
    public ArrayList<Word> parse(){
        BufferedReader input=null;
        try {
            ArrayList<Word> words=new ArrayList<Word>();
            input = new BufferedReader(new FileReader(toBeParsed));
            String line;
            while((line=input.readLine())!=null){
                StringTokenizer tokenizer=new StringTokenizer(line);
                while(tokenizer.hasMoreTokens()){
                    String token=tokenizer.nextToken();
                    Word word=new Word(token);
                    int index=words.indexOf(word);
                    if(index==-1){
                        word.increase();
                        words.add(word);
                    }else
                        words.get(index).increase();
                }
            }
            input.close();
            Collections.sort(words);
            return words;
        } catch (Exception ex) {
            ex.printStackTrace();
            return null;
        }finally{
            if(input!=null){
                try{
                    input.close();
                }catch(IOException e){
                }
            }
        }
    }

    public static void main(String[]args){
        WordParser parser=new WordParser(new File(args[0]));
        ArrayList<Word> words=parser.parse();
        for(Word word:words)
            System.out.println(word.getWord()+":"+word.getOccurrence());
    }
}

rehte()的写的不错，应该就这样了。

sed awk grep sort unique

lz,你的问题其实比较普遍的
最难的地方是要明白什么是一个单词
This is a good forum.这里面有5个单词
This,is,a,good,forum.这里面也有5个单词
This!@#$@#@@#!!@#is,a,good?forum.这也有5个单词而且要能够区分中文更麻烦楼上的我看了他的方法，不可以实现彻底的切分单词
我写了一个大概的，里面可以拆分单词，并且区分中文。不过中文之间没有分开
如“朋友，老友”，我分出来的是两个词，注意：中文分词是非常复杂的，你可以跟他们说这个非常难，不是一下子可以弄出来的
如今天是个好天气，应该出几个词呢？（有兴趣可以查一下中文分词，百渡靠这个吃饭的）
ok
我的方法如下
  public static void main(String[] args) {
    BufferedReader br = null;
    StringBuffer sb = new StringBuffer();
    try {
      br = new BufferedReader(new FileReader("e:\\a.txt"));
      String line;
      while ( (line = br.readLine()) != null) {
        sb.append(line);
        sb.append(" ");
      }
    }
    catch (IOException ex) {
      return;
    }
    finally {
      if (br != null) {
        try {
          br.close();
        }
        catch (IOException ex1) {
        }
      }
    }
    String str = sb.toString();
    System.out.println(str);
    HashMap map = new HashMap();
    String s[] = str.split("\\b");
    for (int i = 0; i < s.length; i++) {
      String words = s[i].trim();
      if(words.length()==0){
        continue;
      }
      boolean isCorrect=true;
      for(int index=0;index<words.length();index++){
        char c=words.charAt(index);
        if(!Character.isLetter(c)){
          System.err.println(c);
          isCorrect=false;
          continue;
        }
      }
      if(!isCorrect){
        continue;
      }
      if (map.containsKey(words)) {
        Integer num = (Integer) map.get(words);
        map.put(words, new Integer(num.intValue() + 1));
      }
      else {
        map.put(words, new Integer(1));
      }
    }
    Iterator keys = map.keySet().iterator();
    while (keys.hasNext()) {
      String words = (String) keys.next();
      Integer num = (Integer) map.get(words);
      System.out.println(words + ":" + num);
    }
  }
}
没做排序，自己做吧。

不要往了这是一个招聘考题，不是具体实现，所以要靠的不是复杂的完整的一点错误的都没有的实现，而是考察一些基本概念和java编程范式：要读取一个文本文件并对其中的每个单词进行统计，单词之间由空格分开的。（某公司的招聘题）
=====================================================================================
注明了单词的含义是空格分开的，它和实际中的所谓单词有一定的区别，实际的单词不仅仅是空格分开，还有可能是回车、制表符、和各种标点符号，这儿的意思是给你简化，不要让你过分考虑复杂。要求：读入一个文本文件(包括中文)
======================================================================================
这儿要考你的是对Java流的掌握，是不是能正确使用常见的流操作，千万不要想复杂，想什么怎么切分中文单词，甚至流行搜索引擎的算法，这样下去就不是考试，而是学术研究了。结果：显示这个文本文件中的所有单词及其出现的次数（按从高到低的顺序）
======================================================================================
考你对于Java常见容器类的用法和面向OO的思想的理解，没有必要翻出算法，自己重写，算法这些东西由计算机科学家和教授们来完成，你需要做的是熟练使用Java平台提供的工具，记住你是程序员，不是学生。

不要忘了这是一道招聘考试题，不是具体工程项目实现，切记将问题复杂化，要考的不是复杂的完整的一点错误的都没有的实现，而是考察一些基本概念和java编程范式：要读取一个文本文件并对其中的每个单词进行统计，单词之间由空格分开的。（某公司的招聘题）
=====================================================================================
注明了单词的含义是空格分开的，它和实际中的所谓单词有一定的区别，实际的单词不仅仅是空格分开，还有可能是回车、制表符、和各种标点符号，这儿的意思是给你简化，不要让你过分考虑复杂。要求：读入一个文本文件(包括中文)
======================================================================================
这儿要考你的是对Java流的掌握，是不是能正确使用常见的流操作，千万不要想得复杂，想什么怎么切分中文单词，甚至流行搜索引擎的算法，这样下去就不是考试，而是学术研究了。结果：显示这个文本文件中的所有单词及其出现的次数（按从高到低的顺序）
======================================================================================
考你对于Java常见容器类的用法和面向OO的思想的理解，没有必要翻出算法，自己重写，算法这些东西由计算机科学家和教授们来完成，你需要做的是熟练使用Java平台提供的工具，记住你是程序员，不是考计算机科目的学生。

写了一个,排序不好,参考.
import java.io.*;
import java.util.*;
public class T {
HashMap<String,Integer> nameAndNum=new HashMap<String,Integer>();

public void readFile() throws IOException{

FileInputStream fis=new FileInputStream("D:\\work\\test.txt");//Your file
BufferedReader br=new BufferedReader(new InputStreamReader(fis));
String line=br.readLine();
while(line!=null){
putToMap(line);
line=br.readLine();
}
nameAndNum.remove(" ");
br.close();
fis.close();
}

public void putToMap(String line){
int num;
String[] words=line.split(" ");
for (int i=0;i<words.length;i++){
if(nameAndNum.containsKey(words[i])){
num=nameAndNum.get(words[i])+1;
nameAndNum.put(words[i],num);
}else{
nameAndNum.put(words[i],1);
}
}

}

public ArrayList<String> sort(){
String numAndName,key;
ArrayList<String> result=new ArrayList<String>();
Iterator<String> it=nameAndNum.keySet().iterator();
while(it.hasNext()){
key=it.next();
numAndName=nameAndNum.get(key)+"|"+key;
result.add(numAndName);
Collections.sort(result);
Collections.reverse(result);
}
return result;
}

public void printResult(List<String> list){
Iterator<String> it=list.iterator();
String[] numToName;
while(it.hasNext()){
numToName=it.next().split("\\|");
System.out.println(numToName[1]+" "+numToName[0]);
}
}

public void execute(){
try {
readFile();
} catch (IOException e) {
e.printStackTrace();
}
printResult(sort());
}

public static void main(String[] args) {
T t=new T();
t.execute();
}}

用c++实现的:
#include <fstream>
#include <map>
#include <vector>
#include <string>
#include <iostream>
#include <algorithm>
using namespace std;void analyse_text(const char *filename)
{
fstream f(filename);
istream_iterator<string> fiter(f);
istream_iterator<string> fend;
map<string,unsigned int > str_map;
while(fiter !=fend)
{
str_map[*fiter]++;
fiter++;
}
map<string,unsigned int>::iterator iter = str_map.begin();
while(iter != str_map.end())
{
cout<<"["<< iter->first<<","<<iter->second<<"]"<<endl;
iter++;
}
}
int main()
{
   analyse_text("test.txt");
   system("pause");
   return 0;
}

哈
这个帖子还真有人顶呀看到rehte() 的话，暗暗的点点头，面试题重点是逻辑和结构
前面贴了拆分单词的方法
排序不要用zjy0009() 这种方式。尤其是面试题，一定要有条理。反正我面试新人的首要思路就是结构清晰。你可以把拆分单词和排序分别放到两个方法里面去。单词及其出现次数做一个类来存储
而且排序一定要用Arrays.sort或者Collections.sort,写一个Comperator来实现

fool_leave(请及时结贴)的程序有问题，虽然支持中文，但是如果有标点，那么就会报错了

baobao28(阿呆) ( ) 信誉：100    Blog  2006-12-30 10:25:36  得分: 0

   fool_leave(请及时结贴)的程序有问题，虽然支持中文，但是如果有标点，那么就会报错了


-----------------------------------------------------不会吧？什么错误呢？
我的测试如下：
文件里面的内如如下：
I am a good boy
朋友+情人,朋友 firend
She is a good girl.You know that?
China^%^&%English@#%$#$%^Good$%$Father,.;
哈哈执行后的结果：
boy:1哈哈:1girl:1know:1that:1I:1English:1a:2China:1Father:1朋友:2Good:1good:2You:1She:1am:1is:1firend:1情人:1
没有报错呀。你把错误贴出来我看看

开发语言：C、C++、JAVA、VB、.NET任选其一
答题说明：
1、请在本地硬盘上建立以自己的名字命名的目录（中文）；
2、把编写的代码放到该目录下；
3、最后生成可执行程序（JAVA程序要生成 .CLASS或 .JAR 文件）；
4、答完后，将源代码、可执行程序保留在目录中，并告知监考老师留存；
5、将根据您的源代码和可执行文件来评判您的成绩；任务名称：文章处理器
任务功能：
1、把提供的英文文件中的相同单词进行统计，并生成统计清单文件（文件名：TEST.TXT），格式举例如下：
单词                                  出现频率
This                                     20
You                                      10
……2、按照不同的单词统计，不能重复统计；
3、单词区分大小写，按照不同的单词统计；
4、单数和复数视为不同的单词，如：file和files可以看作不同的单词；
5、将其中10个单词汉化，用对应的中文替换，并生成新的文件（文件名：TRANS.TXT），并保留原来的没有被替换的单词和文字，生成新的文件；
操作方法：直接在命令行运行即可。考核重点：
1、计算机语言基础；
2、函数的使用；
3、自定义函数，至少使用3个子定义函数，不许把代码都写在一个函数或代码段中；
4、对于使用 C++、JAVA和.NET工具的，可以使用类来处理；
5、算法的高效性和编码语言的精炼性；附加考题文件名：考题附加文件.TXT这是原题  希望高手帮帮小弟刚入此道一知半解为了以后还望前辈多多关照

rehte() 程序每个步骤的功能什么不明白呀

调试易

重金悬赏100分求程序代码（文本文件中的单词统计）

解决方案 »