求教java文本过滤处理 用Pattern matcher,找到想要的,写到一个新文件中不就可以了吗 解决方案 » 免费领取超大流量手机卡,每月29元包185G流量+100分钟通话, 中国电信官方发货 正则表达式 和 String类的一些方法结合 这是我写的程序:我从来没写过正则表达式 写的好像根本不对 哪位帮我看看改改啊 十分感谢 class Main{ public static void main(String[] args) throws IOException { String file="/Users/csdn/Desktop/test.rtf"; BufferedReader br; try { br = new BufferedReader(new FileReader(file)); String line; String re1=".*?"; // Non-greedy match on filler String re2="((?:[I-z][d-z]+))"; // ID String re3="((?:[c-z][a-z]+))"; // Category Pattern p = Pattern.compile(re1+re2+re3,Pattern.CASE_INSENSITIVE | Pattern.DOTALL); Matcher m = p.matcher(file); while((line=br.readLine())!=null){ m=p.matcher(line); if (m.find()) { String day1=m.group(1); String word1=m.group(2); System.out.print(" "+day1.toString()+" "+" "+word1.toString()+" "+"\n"); } } } catch (FileNotFoundException e) { // TODO Auto-generated catch block e.printStackTrace(); System.out.println("fail"); }}}输出结果是 \font tbl color tbl ar gl ardir natural ardir natural AS IN dis continued AS IN tit le gro up ales rank simi lar ategori es Boo ks Boo ks revie ws cutom er cutom er AS IN tit le gro up ales rank simi lar ategori es Boo ks Boo ks revie ws cutom er cutom er ...... cutom er cutom er AS IN tit le ...... 而不是 希望得到的 1 Book2 Book 3 Booktxt里的内容:Id: 0ASIN: 0771044445 discontinued productId: 1ASIN: 0827229534 title: Patterns of Preaching: A Sermon Sampler group: Book salesrank: 396585 similar: 5 0804215715 156101074X 0687023955 0687074231 082721619X categories: 2 |Books[283155]|Subjects[1000]|Religion & Spirituality[22]|Christianity[12290]|Clergy[12360]|Preaching[12368] |Books[283155]|Subjects[1000]|Religion & Spirituality[22]|Christianity[12290]|Clergy[12360]|Sermons[12370] reviews: total: 2 downloaded: 2 avg rating: 5 2000-7-28 cutomer: A2JW67OY8U6HHK rating: 5 votes: 10 helpful: 9 2003-12-14 cutomer: A2VE83MZF98ITY rating: 5 votes: 6 helpful: 5Id: 2ASIN: 0738700797 title: Candlemas: Feast of Flames group: Book salesrank: 168596 similar: 5 0738700827 1567184960 1567182836 0738700525 0738700940 categories: 2 |Books[283155]|Subjects[1000]|Religion & Spirituality[22]|Earth-Based Religions[12472]|Wicca[12484] |Books[283155]|Subjects[1000]|Religion & Spirituality[22]|Earth-Based Religions[12472]|Witchcraft[12486] reviews: total: 12 downloaded: 12 avg rating: 4.5 2001-12-16 cutomer: A11NCO6YTE4BTJ rating: 5 votes: 5 helpful: 4 2002-1-7 cutomer: A9CQ3PLRNIR83 rating: 4 votes: 5 helpful: 5 2002-1-24 cutomer: A13SG9ACZ9O5IM rating: 5 votes: 8 helpful: 8 2002-1-28 cutomer: A1BDAI6VEYMAZA rating: 5 votes: 4 helpful: 4 2002-2-6 cutomer: A2P6KAWXJ16234 rating: 4 votes: 16 helpful: 16 2002-2-14 cutomer: AMACWC3M7PQFR rating: 4 votes: 5 helpful: 5 2002-3-23 cutomer: A3GO7UV9XX14D8 rating: 4 votes: 6 helpful: 6 2002-5-23 cutomer: A1GIL64QK68WKL rating: 5 votes: 8 helpful: 8 2003-2-25 cutomer: AEOBOF2ONQJWV rating: 5 votes: 8 helpful: 5 2003-11-25 cutomer: A3IGHTES8ME05L rating: 5 votes: 5 helpful: 5 2004-2-11 cutomer: A1CP26N8RHYVVO rating: 1 votes: 13 helpful: 9 2005-2-7 cutomer: ANEIANH0WAT9D rating: 5 votes: 1 helpful: 1Id: 3ASIN: 0486287785 title: World War II Allied Fighter Planes Trading Cards group: Book salesrank: 1270652 similar: 0 categories: 1 |Books[283155]|Subjects[1000]|Home & Garden[48]|Crafts & Hobbies[5126]|General[5144] reviews: total: 1 downloaded: 1 avg rating: 5 2003-7-10 cutomer: A3IDGASRQAW8B2 rating: 5 votes: 2 helpful: 2 不知道楼主提供的日志文件中的每个ID是否都会有一个GROUP相对应。如果是的话,假设源数据文件内容为如下:Id: 1ASIN: 0827229534 title: Patterns of Preaching: A Sermon Sampler group: Book salesrank: 396585Id: 2ASIN: 0738700797 title: Candlemas: Feast of Flames group: Book salesrank: 168596 similar: 5 0738700827 1567184960 1567182836 0738700525 0738700940Id: 3ASIN: 0486287785 title: World War II Allied Fighter Planes Trading Cards group: Book salesrank: 1270652 similar: 0其它的内容因为篇幅省略,放在D盘的DATA.TXT文件中。之后程序如下: public static void main(String[]args) throws IOException{ File inFile = new File("D:"+File.separator+"data.txt"); File outFile = new File("D:"+File.separator+"data2.txt"); BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(outFile))); BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(inFile))); Pattern pattern = Pattern.compile("(Id:){1}\\s*\\w+|(group:)\\s*\\w+"); String str = ""; Matcher matcher; while((str = reader.readLine()) !=null){ matcher= pattern.matcher(str.trim()); if(matcher.matches()){ if(str.contains("Id")){ String[] idStrings = str.trim().split(":\\s*"); writer.write(idStrings[idStrings.length - 1]+"\t"); }else if(str.contains("group")){ String[] groupStrings = str.split(":\\s*"); writer.write(groupStrings[groupStrings.length - 1]+"\n"); } } } reader.close(); writer.flush(); writer.close(); System.out.println("文本过滤完毕");}你所要的结果就会写在DATA2.TXT中 谢谢您 但是我用这段程序后 输出的data2.txt依然是空文件,正则表达式 好像还是没有匹配上 正则式用这个:String re="(Id|group): [\\d\\w]*";测试代码如下,Test.txt文件的内容是你主贴贴的那段文本,写入文件的自己自己搞定吧,加油哈,你行的import java.io.DataOutputStream; import java.io.File; import java.io.FileOutputStream; import java.io.IOException; import java.io.OutputStreamWriter; import java.io.*;import java.util.regex.*;public class Test { /** * @param args */ public static void main(String[] args) { File file = new File("c:\\Test.txt"); if (file.isFile() && file.exists()) { try { InputStreamReader read = new InputStreamReader(new FileInputStream(file)); BufferedReader bufferedReader = new BufferedReader(read); String lineTXT = null; while ((lineTXT = bufferedReader.readLine()) != null){ String re="(Id|group): [\\d\\w]*"; Pattern p = Pattern.compile(re); Matcher m = p.matcher(lineTXT); while (m.find()) { String tmp = m.group(); if (!"".equals(tmp)) { System.out.println(tmp); } } } read.close(); } catch (Exception e) { e.printStackTrace(); } } else{ System.out.println("找不到指定的文件!"); } } } 完整的测试代码,供参考import java.io.File; import java.io.FileOutputStream; import java.io.IOException; import java.io.OutputStreamWriter; import java.io.*;import java.util.regex.*;public class Test { /** * @param args */ public static void main(String[] args) { File file = new File("c:\\Test.txt"); File file2 = new File("c:\\demo.txt"); if (file.isFile() && file.exists()) { try { InputStreamReader read = new InputStreamReader(new FileInputStream(file)); OutputStreamWriter writer = new OutputStreamWriter(new FileOutputStream(file2)); BufferedReader bufferedReader = new BufferedReader(read); String lineTXT = null; while ((lineTXT = bufferedReader.readLine()) != null){ String re="(Id|group): [\\d\\w]*"; Pattern p = Pattern.compile(re); Matcher m = p.matcher(lineTXT); while (m.find()) { String tmp = m.group(); if (!"".equals(tmp)) { writer.write(tmp+"\r\n"); } } writer.flush(); } read.close(); } catch (Exception e) { e.printStackTrace(); } } else{ System.out.println("找不到指定的文件!"); } } } 谢谢您 我最后的输出还是没有ID 号 格式如下:Id: group: BookId: group: MusicId: group: BookId 后面没有数字 不知道是为什么呢 你那文本到底是不是标准的? Id:和group: 后面跟的是几个空格? group后面是一个空格吧?你主贴给的Id:后面也是一个空格啊, 你要是后面的空格数不对那肯定是读不出来,判断的正则式得改成这样:String re="(Id|group): [\\s\\d\\w]*"; 一个循环问题 如何让工具栏随着frame大小的变化而变化 用eclipse连接oracle10g的问题 求救,关于文件处理API中的更改文件名的函数renameTo()问题! 请问一下大家,java究竟是用于什么?实用方面? jdbc连接数据库 哪位网友可以发给我一分<<java2核心技术-原理>>的附带的源代码吗? 各位老大进来看一看(初学者有分相送 严正抗议:为什么微软技术有一个专区,而Java技术只有一个板块??? SOS!!!!-------在什么地方下载到Jbuilder5啊?? 提示的错误是“需要标识符”,就是不明白需要什么标识符? 求点拨一个Socket通信程序的调试.
class Main
{
public static void main(String[] args) throws IOException
{
String file="/Users/csdn/Desktop/test.rtf";
BufferedReader br;
try {
br = new BufferedReader(new FileReader(file));
String line;
String re1=".*?"; // Non-greedy match on filler
String re2="((?:[I-z][d-z]+))"; // ID
String re3="((?:[c-z][a-z]+))"; // Category Pattern p = Pattern.compile(re1+re2+re3,Pattern.CASE_INSENSITIVE | Pattern.DOTALL);
Matcher m = p.matcher(file);
while((line=br.readLine())!=null){
m=p.matcher(line);
if (m.find())
{
String day1=m.group(1);
String word1=m.group(2);
System.out.print(" "+day1.toString()+" "+" "+word1.toString()+" "+"\n");
}
}
}
catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
System.out.println("fail");
}
}
}
输出
结果是
\font tbl
color tbl
ar gl
ardir natural
ardir natural
AS IN
dis continued
AS IN
tit le
gro up
ales rank
simi lar
ategori es
Boo ks
Boo ks
revie ws
cutom er
cutom er
AS IN
tit le
gro up
ales rank
simi lar
ategori es
Boo ks
Boo ks
revie ws
cutom er
cutom er
......
cutom er
cutom er
AS IN
tit le
......
而不是 希望得到的
1 Book
2 Book
3 Booktxt里的内容:Id: 0
ASIN: 0771044445
discontinued productId: 1
ASIN: 0827229534
title: Patterns of Preaching: A Sermon Sampler
group: Book
salesrank: 396585
similar: 5 0804215715 156101074X 0687023955 0687074231 082721619X
categories: 2
|Books[283155]|Subjects[1000]|Religion & Spirituality[22]|Christianity[12290]|Clergy[12360]|Preaching[12368]
|Books[283155]|Subjects[1000]|Religion & Spirituality[22]|Christianity[12290]|Clergy[12360]|Sermons[12370]
reviews: total: 2 downloaded: 2 avg rating: 5
2000-7-28 cutomer: A2JW67OY8U6HHK rating: 5 votes: 10 helpful: 9
2003-12-14 cutomer: A2VE83MZF98ITY rating: 5 votes: 6 helpful: 5Id: 2
ASIN: 0738700797
title: Candlemas: Feast of Flames
group: Book
salesrank: 168596
similar: 5 0738700827 1567184960 1567182836 0738700525 0738700940
categories: 2
|Books[283155]|Subjects[1000]|Religion & Spirituality[22]|Earth-Based Religions[12472]|Wicca[12484]
|Books[283155]|Subjects[1000]|Religion & Spirituality[22]|Earth-Based Religions[12472]|Witchcraft[12486]
reviews: total: 12 downloaded: 12 avg rating: 4.5
2001-12-16 cutomer: A11NCO6YTE4BTJ rating: 5 votes: 5 helpful: 4
2002-1-7 cutomer: A9CQ3PLRNIR83 rating: 4 votes: 5 helpful: 5
2002-1-24 cutomer: A13SG9ACZ9O5IM rating: 5 votes: 8 helpful: 8
2002-1-28 cutomer: A1BDAI6VEYMAZA rating: 5 votes: 4 helpful: 4
2002-2-6 cutomer: A2P6KAWXJ16234 rating: 4 votes: 16 helpful: 16
2002-2-14 cutomer: AMACWC3M7PQFR rating: 4 votes: 5 helpful: 5
2002-3-23 cutomer: A3GO7UV9XX14D8 rating: 4 votes: 6 helpful: 6
2002-5-23 cutomer: A1GIL64QK68WKL rating: 5 votes: 8 helpful: 8
2003-2-25 cutomer: AEOBOF2ONQJWV rating: 5 votes: 8 helpful: 5
2003-11-25 cutomer: A3IGHTES8ME05L rating: 5 votes: 5 helpful: 5
2004-2-11 cutomer: A1CP26N8RHYVVO rating: 1 votes: 13 helpful: 9
2005-2-7 cutomer: ANEIANH0WAT9D rating: 5 votes: 1 helpful: 1Id: 3
ASIN: 0486287785
title: World War II Allied Fighter Planes Trading Cards
group: Book
salesrank: 1270652
similar: 0
categories: 1
|Books[283155]|Subjects[1000]|Home & Garden[48]|Crafts & Hobbies[5126]|General[5144]
reviews: total: 1 downloaded: 1 avg rating: 5
2003-7-10 cutomer: A3IDGASRQAW8B2 rating: 5 votes: 2 helpful: 2
Id: 1
ASIN: 0827229534
title: Patterns of Preaching: A Sermon Sampler
group: Book
salesrank: 396585
Id: 2
ASIN: 0738700797
title: Candlemas: Feast of Flames
group: Book
salesrank: 168596
similar: 5 0738700827 1567184960 1567182836 0738700525 0738700940
Id: 3
ASIN: 0486287785
title: World War II Allied Fighter Planes Trading Cards
group: Book
salesrank: 1270652
similar: 0
其它的内容因为篇幅省略,放在D盘的DATA.TXT文件中。之后程序如下:
public static void main(String[]args) throws IOException{
File inFile = new File("D:"+File.separator+"data.txt");
File outFile = new File("D:"+File.separator+"data2.txt");
BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(outFile)));
BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(inFile)));
Pattern pattern = Pattern.compile("(Id:){1}\\s*\\w+|(group:)\\s*\\w+");
String str = "";
Matcher matcher;
while((str = reader.readLine()) !=null){
matcher= pattern.matcher(str.trim());
if(matcher.matches()){
if(str.contains("Id")){
String[] idStrings = str.trim().split(":\\s*");
writer.write(idStrings[idStrings.length - 1]+"\t");
}else if(str.contains("group")){
String[] groupStrings = str.split(":\\s*");
writer.write(groupStrings[groupStrings.length - 1]+"\n");
}
}
}
reader.close();
writer.flush();
writer.close();
System.out.println("文本过滤完毕");
}你所要的结果就会写在DATA2.TXT中
String re="(Id|group): [\\d\\w]*";
测试代码如下,Test.txt文件的内容是你主贴贴的那段文本,写入文件的自己自己搞定吧,加油哈,你行的
import java.io.DataOutputStream;
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.OutputStreamWriter;
import java.io.*;
import java.util.regex.*;public class Test {
/**
* @param args
*/
public static void main(String[] args) {
File file = new File("c:\\Test.txt");
if (file.isFile() && file.exists()) {
try {
InputStreamReader read = new InputStreamReader(new FileInputStream(file));
BufferedReader bufferedReader = new BufferedReader(read);
String lineTXT = null;
while ((lineTXT = bufferedReader.readLine()) != null){
String re="(Id|group): [\\d\\w]*";
Pattern p = Pattern.compile(re);
Matcher m = p.matcher(lineTXT);
while (m.find()) {
String tmp = m.group();
if (!"".equals(tmp)) {
System.out.println(tmp);
}
}
}
read.close();
}
catch (Exception e) {
e.printStackTrace();
}
}
else{
System.out.println("找不到指定的文件!");
}
}
}
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.OutputStreamWriter;
import java.io.*;
import java.util.regex.*;public class Test {
/**
* @param args
*/
public static void main(String[] args) {
File file = new File("c:\\Test.txt");
File file2 = new File("c:\\demo.txt");
if (file.isFile() && file.exists()) {
try {
InputStreamReader read = new InputStreamReader(new FileInputStream(file));
OutputStreamWriter writer = new OutputStreamWriter(new FileOutputStream(file2));
BufferedReader bufferedReader = new BufferedReader(read);
String lineTXT = null;
while ((lineTXT = bufferedReader.readLine()) != null){
String re="(Id|group): [\\d\\w]*";
Pattern p = Pattern.compile(re);
Matcher m = p.matcher(lineTXT);
while (m.find()) {
String tmp = m.group();
if (!"".equals(tmp)) {
writer.write(tmp+"\r\n");
}
}
writer.flush();
}
read.close();
}
catch (Exception e) {
e.printStackTrace();
}
}
else{
System.out.println("找不到指定的文件!");
}
}
}
group: Book
Id:
group: Music
Id:
group: BookId 后面没有数字 不知道是为什么呢
String re="(Id|group): [\\s\\d\\w]*";