源码为:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.io.File;
import java.io.IOException;
import java.io.RandomAccessFile;public class regex {
public static void main (String args[]) throws IOException{
byte bt[]=new byte[10000];
File file = new File("C:\\test.txt");
if(!file.exists()){
System.out.println("文件找不到");
}
RandomAccessFile raf = new RandomAccessFile(file, "rw");
System.out.println(file.length());
raf.seek(0);
for(int i=0;i<=raf.length();i++){
bt[i] = raf.readByte();
}
String s = new String(bt);
System.out.print("文件输出"+s);
Pattern p = Pattern.compile("Referer:(.*?)Accept",Pattern.DOTALL);
Matcher m = p.matcher(s);
if (m.find()) {
System.out.println("正则输出:"+m.group(1));
}
else System.out.println("no result");
}
}
C:\\test.txt 内容为:op=msgcount&charset=gbk&callback=IMOld&refer=hi.baidu.com&un=damoguyan258&.stamp=h4l6m487 HTTP/1.1
Accept: */*
Referer: http://zhidao.baidu.com/question/120610720.html
Accept-Language: zh-CN
Accept-Encoding: gzip, deflate
ThreadID: 5556
User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; SE 2.X MetaSr 1.0)
Host: fetch.im.baidu.com
Connection: Keep-Alive
Cookie: BDUSS=Es3RC1QWmQtVi1QUEQxdmVsWS0xan51djBOUzM4ZkJCTVUzdW5qWEZ-endPdXBRQVFBQUFBJCQAAAAAAAAAAAokNw6t8CcOZGFtb2d1eWFuMjU4AAAAAAAAAAAAAAAAAAAAAAAAAACAYIArMAAAAOD6z5YqAAAALWdCAAAAAAAxMC4zNi4xMfDs~E~w7PxPa; BDUT=gggm542C2A08A4ADD4D59B38D1C778B79F7D1386ef17c2a0; BAIDUID=C338BC4650011A75CA5A05D7D2760BB8:FG=1; IM_old=0|h4l6m47x
两个问题:1、读入文件的eofexception怎么解决 2、我想得到Referer:后面的url,可为何正则表达式貌似没有执行一样,没有输出结果小弟很菜,纠结了很久,求大侠给点时间帮忙解决一下!!!
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.io.File;
import java.io.IOException;
import java.io.RandomAccessFile;public class regex {
public static void main (String args[]) throws IOException{
byte bt[]=new byte[10000];
File file = new File("C:\\test.txt");
if(!file.exists()){
System.out.println("文件找不到");
}
RandomAccessFile raf = new RandomAccessFile(file, "rw");
System.out.println(file.length());
raf.seek(0);
for(int i=0;i<=raf.length();i++){
bt[i] = raf.readByte();
}
String s = new String(bt);
System.out.print("文件输出"+s);
Pattern p = Pattern.compile("Referer:(.*?)Accept",Pattern.DOTALL);
Matcher m = p.matcher(s);
if (m.find()) {
System.out.println("正则输出:"+m.group(1));
}
else System.out.println("no result");
}
}
C:\\test.txt 内容为:op=msgcount&charset=gbk&callback=IMOld&refer=hi.baidu.com&un=damoguyan258&.stamp=h4l6m487 HTTP/1.1
Accept: */*
Referer: http://zhidao.baidu.com/question/120610720.html
Accept-Language: zh-CN
Accept-Encoding: gzip, deflate
ThreadID: 5556
User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; SE 2.X MetaSr 1.0)
Host: fetch.im.baidu.com
Connection: Keep-Alive
Cookie: BDUSS=Es3RC1QWmQtVi1QUEQxdmVsWS0xan51djBOUzM4ZkJCTVUzdW5qWEZ-endPdXBRQVFBQUFBJCQAAAAAAAAAAAokNw6t8CcOZGFtb2d1eWFuMjU4AAAAAAAAAAAAAAAAAAAAAAAAAACAYIArMAAAAOD6z5YqAAAALWdCAAAAAAAxMC4zNi4xMfDs~E~w7PxPa; BDUT=gggm542C2A08A4ADD4D59B38D1C778B79F7D1386ef17c2a0; BAIDUID=C338BC4650011A75CA5A05D7D2760BB8:FG=1; IM_old=0|h4l6m47x
两个问题:1、读入文件的eofexception怎么解决 2、我想得到Referer:后面的url,可为何正则表达式貌似没有执行一样,没有输出结果小弟很菜,纠结了很久,求大侠给点时间帮忙解决一下!!!
for(int i=0; i<=raf.length(); i++){
超界了,修改下:
for(int i=0; i<raf.length(); i++){2、我想得到Referer:后面的url
Pattern p = Pattern.compile("Referer:(.*?)Accept",Pattern.DOTALL);
修改下试试看:
Pattern p = Pattern.compile("Referer:([^\\n]*)");
我3楼说的 <= 修改为 <,这个试了么?
囧,问题在这里!多些帮助,可是我的正则表达式没有输出是怎么回事,我用正则式单独对String流处理的时候是能够输出我想要的结果的,结果从文件读取的时候就没有结果了,这问题又处在哪里呢?
RandomAccessFile 随时读取功能是不是更加强大一点?我的文档是需要抓包随时更新的!
File file = new File("test.txt");
if(!file.exists()){
System.out.println("文件找不到");
}RandomAccessFile raf = new RandomAccessFile(file, "rw");
ByteArrayOutputStream bos = new ByteArrayOutputStream();
int len = 0;
while ((len=raf.read(bt)) != -1) { //用eof判断循环更好一些
bos.write(bt, 0, len);
}
raf.close();String s = new String(bos.toByteArray());
System.out.printf("文件输出:%s\n", s);
Pattern p = Pattern.compile("Referer:(.*?)Accept",Pattern.DOTALL);
Matcher m = p.matcher(s);
if (m.find())
System.out.println("正则输出:"+m.group(1));
else
System.out.println("no result");LZ的正则应该可以的,如果不能匹配成功,说明是文件编码的问题,如果文件本身带有编码,应该用带编码的方式读入
嗯,按兄台的意思应该是我即使能够打印,但也无法将从txt文档中取得的字符串用正则式比较?那是不是意味着我还要进行相应的编码格式的转换?问题有点多,希望能不吝赐教!!!