怎么去掉一个txt文件中的中文??

啊？
读出来，然后修改后再写去另一个TXT

读一行处理一行，用正则表达式将中文替换成""空字符串。中文的正则是：\u4e00-\u9fa5方法replaceAll替换一行写入一行，具体的实现过程自己思考，有很多种方法可以处理，上面几位前辈的也不错。

import java.io.BufferedReader;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.FileReader;
import java.io.IOException;
import java.io.OutputStream;
public class Test { public static void main(String[] args) throws Exception{

File file = new File("E:/data.txt");
File outFile = new File("E:/out.txt");
if(!outFile.exists()){
outFile.createNewFile();
}
BufferedReader read = null;
OutputStream out = new FileOutputStream(outFile);

StringBuffer buffer = new StringBuffer();
try {
read = new BufferedReader(new FileReader(file));
String s = "";
while((s = read.readLine()) != null){
String[] result = parseChar(s);
buffer.append(result[1]).append("\n");
}
out.write(buffer.toString().getBytes());
out.flush();
} catch (FileNotFoundException e) {
e.printStackTrace();
System.out.println("读取文件出错！");
} catch (IOException e) {
e.printStackTrace();
}
finally{
if(read != null){
try {
read.close();
out.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}

} public static String[] parseChar(String s) {
if(s != null && s.trim().length() > 0){
String[] result = new String[2];
int index = -1;//定义汉字出现的位置，定义成-1是假设没有汉字
for (int i = s.length() - 1; i >= 0; i--) { String temp = Character.toString(s.charAt(i));//从后往前截取每一个元素
byte[] array = temp.getBytes();//每个元素的字节数
if (array.length > 1) {
index = i;//汉字出现的位置
break;
}
}
result[0] = s.substring(0, index + 1).trim();
result[1] = s.substring(index + 1).trim(); System.out.println("中文是" + "\t" + result[0] + "*******************英文是" + "\t" + result[1]); return result;
}
return null;
}}

这个应该不难.
先说一下思路:先将文件里的内容读入,再将中文字符过滤掉,最后重新写入到文件中去.
文件的读入和写入应该不难，在这里就不说，我们说说最关键的，也就是要怎么过滤中文字符。
我们都知道，每个字符在ASCII表里都对应着一个编码。那么中文字符就肯定是这个编码表里面的一段字符。
而从128开始的字符就是中文，因而我们可以判断读入的每个字符是不是大于128，如果是的话就删除。最后剩下的就是非中文的字符了。

    public static String[] parseChar(String s) {
        if(s != null && s.trim().length() > 0){
            String[] result = new String[2];
            int index = -1;//定义汉字出现的位置，定义成-1是假设没有汉字
            for (int i = s.length() - 1; i >= 0; i--) {                String temp = Character.toString(s.charAt(i));//从后往前截取每一个元素
                byte[] array = temp.getBytes();//每个元素的字节数
                if (array.length > 1) {
                    index = i;//汉字出现的位置
                    break;
                }
            }
            result[0] = s.substring(0, index + 1).trim();
            result[1] = s.substring(index + 1).trim();            System.out.println("中文是" + "\t" + result[0] + "*******************英文是" + "\t" + result[1]);            return result;
        }
        return null;
    }
此方法返回的长度为2的数组，数组里第一个元素是汉字，第二个元素字母

Exception in thread "main" java.lang.NullPointerException
at com.hx.removechinese.RemoveChinese.main(RemoveChinese.java:25)
中文是 Ӣ��ʸ��׺��ȫ*******************英文是
悲剧呀25行是: buffer.append(result[1]).append("\n");

有点小问题data。txt中文前的英文不识别
例如 ssssdfffffffffasdfwan哇啊山东士大夫得到aaaaaaaaaaaaaaaajjjjjjjjjjjjjj;jjjjj
输出aaaaaaaaaaaaaaaajjjjjjjjjjjjjj;jjjjj
不过相当不错了

哦。是哈。之前有个帖子是将汉字和字母分开，但是他那个是有特殊格式的，汉字在前，字母在后，呵呵。。我就拿过来用了。那就得改一改，碰到汉字就给替换掉，break去掉，而且直接返回去掉汉字之后的string就好了。不用返回数组了。

import java.io.BufferedReader;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.FileReader;
import java.io.IOException;
import java.io.OutputStream;public class Test { public static void main(String[] args) throws Exception { File file = new File("E:/data.txt");
File outFile = new File("E:/out.txt");
if (!outFile.exists()) {
outFile.createNewFile();
}
BufferedReader read = null;
OutputStream out = new FileOutputStream(outFile); StringBuffer buffer = new StringBuffer();
try {
read = new BufferedReader(new FileReader(file));
String s = "";
while ((s = read.readLine()) != null) {
String result = parseChar(s);
buffer.append(result).append("\n");
}
out.write(buffer.toString().getBytes());
out.flush();
} catch (FileNotFoundException e) {
e.printStackTrace();
System.out.println("读取文件出错！");
} catch (IOException e) {
e.printStackTrace();
} finally {
if (read != null) {
try {
read.close();
out.close();
} catch (IOException e) {
e.printStackTrace();
}
}
} } public static String parseChar(String s) {
String flag = s;
if (s != null && s.trim().length() > 0) {
for (int i = s.length() - 1; i >= 0; i--) {
String temp = Character.toString(s.charAt(i));// 从后往前截取每一个元素
byte[] array = temp.getBytes();// 每个元素的字节数
if (array.length > 1) {
flag = flag.replaceAll(temp, "");
}
}
System.out.println(flag);
//
return flag;
}
return null;
}}

    public static String[] parseChar(String s) {
        if(s != null && s.trim().length() > 0){
            String[] result = new String[2];
            int index = -1;//定义汉字出现的位置，定义成-1是假设没有汉字
            for (int i = s.length() - 1; i >= 0; i--) {                String temp = Character.toString(s.charAt(i));//从后往前截取每一个元素
                byte[] array = temp.getBytes();//每个元素的字节数
                if (array.length > 1) {
                    index = i;//汉字出现的位置
                    break;
                }
            }
            result[0] = s.substring(0, index + 1).trim();
            result[1] = s.substring(index + 1).trim();            System.out.println("中文是" + "\t" + result[0] + "*******************英文是" + "\t" + result[1]);            return result;
        }
        return null;
    }

使用正则表达式，上google搜一下

public static boolean isLetter(char c) {
        int k = 0X80;
        return c / k == 0 ? true : false;
    }
这回知道了吧。

调试易

怎么去掉一个txt文件中的中文??

解决方案 »