随机生成10亿个整数,写入文件。下面是代码:FileWriter f=new FileWriter("D:/data.txt");
BufferedWriter buf=new BufferedWriter(f,1024*512);
int o=0;
while(o<10000){
int []array=new int[100000];
for(int i=0;i<array.length;i++)
array[i]=(int) (Math.random()*1000000);
int j=0;
while(j<array.length){
buf.write(String.valueOf(array[j]));
System.out.println(array[j]);
j++;
}
o++;
}
buf.close();发现速度真的是太慢了,该怎么优化一下使写入文件的速度加快,各位大牛指点Javaio大数据
BufferedWriter buf=new BufferedWriter(f,1024*512);
int o=0;
while(o<10000){
int []array=new int[100000];
for(int i=0;i<array.length;i++)
array[i]=(int) (Math.random()*1000000);
int j=0;
while(j<array.length){
buf.write(String.valueOf(array[j]));
System.out.println(array[j]);
j++;
}
o++;
}
buf.close();发现速度真的是太慢了,该怎么优化一下使写入文件的速度加快,各位大牛指点Javaio大数据
1个整数4bit,10亿个就是4G,我这个文件就得是4G
public static void main(String[] args) throws Exception { long startTime = System.currentTimeMillis();
BufferedOutputStream bos = new BufferedOutputStream(new FileOutputStream("c:/test.txt")); Random rad = new Random(); long i = 0;
while (i < 1000000000) {
bos.write(int2bytes(rad.nextInt()));
i++;
} bos.flush();
bos.close(); System.out.println(System.currentTimeMillis() - startTime); } static byte[] int2bytes(int i) throws IOException {
//System.out.println(i); int len = 0;// 数字长度
int index = 10;// 游标
boolean isNegative = false;// 是否负数
byte[] buff = new byte[11];// int最大长度为10位,包括符号11位 if (i < 0) {
i = 0 - i;
len++;
isNegative = true;
} while (i != 0) {
buff[index--] = (byte) ((i % 10 + 48) & 0xff);
i /= 10;
len++;
} if (isNegative) {
buff[index] = 45 & 0xff;
} byte[] rs = new byte[len]; System.arraycopy(buff, 11 - len, rs, 0, len); return rs;
}耗时103548ms,生成的文件有9.29G,硬盘写入速度倒成瓶颈了
System.arraycopy()这个底层的内存拷贝也很耗时的。
/**
* int转byte数组
*
* @param a
* @return
*/
public static byte[] int2Byte(int a) {
byte[] b = new byte[4];
b[0] = (byte) (a >> 24);
b[1] = (byte) (a >> 16);
b[2] = (byte) (a >> 8);
b[3] = (byte) (a);
return b;
}
long startTime = System.currentTimeMillis();
FileWriter f=new FileWriter("D:/data.txt");
BufferedWriter buf=new BufferedWriter(f,1024*512);
int o=0;
while(o<1000000000){
buf.write((int) (Math.random()*1000000));
o++;
}
buf.close();
System.out.println(System.currentTimeMillis() - startTime);
文件只有1.26G 耗时305016 这是什么原因啊,是不是数据没有完全写完呢?
亲,字符串可跟int直接转成的4个字节不同哦,而且我刚开始也是你那样想的,后来试了一下发现生成的文本文档根本打不开,一堆乱码。
现在硬盘的簇大小一般是4k用4k应该好点吧
另外楼主是想把数字一字符串的形式写入文本文档还是用字节的形式存储这些数字额?
static void t8(){
byte[] buff = new byte[512];//4*128//128个int
Random rand = new Random(System.currentTimeMillis());
rand.nextBytes(buff);
OutputStream out = null;
try{
out = new FileOutputStream("c:\\test1.data");
long l = System.currentTimeMillis();
for(int i=0,end=10000*10000*10; i<end; i+=128){
out.write(buff, 0, 512);
out.flush();
}
System.out.println(System.currentTimeMillis()-l);
out.close();
}catch(Exception ex){
ex.printStackTrace();
}
}这个测试结果:
写入:3.72 GB (4,000,002,048 字节)
耗时:125734我估计 我机器上java 写硬盘速度就这样了。
final int max = 1000000000; File file = new File("D:/data.dat");
DataOutputStream dos = null;
try {
dos = new DataOutputStream(new BufferedOutputStream(new FileOutputStream(
file)));
for (int i = 0; i < n; i++) {
int x = (int) (Math.random() * max);
System.out.println(x);
dos.writeInt(x);
}
} finally {
if (dos != null) {
try {
dos.close();
dos = null;
} catch (Exception ex) {
}
}
}
System.out.println("==========以下为读取===========");
DataInputStream dis = null;
try {
dis = new DataInputStream(new BufferedInputStream(new FileInputStream(
file)));
for (int i = 0; i < n; i++) {
int x = dis.readInt();
System.out.println(x);
}
} finally {
if (dis != null) {
try {
dis.close();
dis = null;
} catch (Exception ex) {
}
}
}
while (i < 1000000000) {
bos.write(int2bytes(rad.nextInt()));
i++;
} bos.flush();
bos.close(); System.out.println(System.currentTimeMillis() - startTime); } static int bytes2int(byte[] b) {
return b[0] << 24 + b[1] << 16 + b[2] << 8 + b[3];
} static byte[] int2bytes(int i) {
byte[] rs = new byte[4]; rs[0] = (byte) (i >> 24);
rs[1] = (byte) (i >> 16);
rs[2] = (byte) (i >> 8);
rs[3] = (byte) (i); return rs;
}
耗时69217ms,文件大小3.72 GB (4,000,000,000 字节),占用空间3.72 GB (4,000,002,048 字节),不写硬盘耗时11419ms。
如果是lz的n=100000,不到50毫秒,大小才390K
小绵羊,没必要发明轮子,直接用DataI/OStream即可。
int x = bytes2int(bytes);
System.out.println(x);>>>
cpu运算慢意思是指我的电脑问题吗,这样生成的是字节数据,如果以字节的形式存储,我的初衷是想把写到文件的整数进行排序,那是不是可以重新把这些字节组成一个个整数再进行排序
超过2G的文件,RandomAccessFile 就不支持了。这个应该和底层实现有关吧。
import java.io.IOException;
import java.io.RandomAccessFile;
import java.nio.MappedByteBuffer;
import java.nio.channels.FileChannel;
import java.nio.channels.FileChannel.MapMode;
import java.util.Random;
import java.util.concurrent.ArrayBlockingQueue;
import java.util.concurrent.BlockingQueue;
import java.util.concurrent.CyclicBarrier;
public class Test_3 {
static Block POISON = new Block();
static class Block{
static enum State{init,writing,finished};
State stat=State.init;
int index;
int[] rand;
}
static class RandomThread extends Thread{
CyclicBarrier barrier;
BlockingQueue<Block> randomQueue;
BlockingQueue<Block> writerQueue;
int max;
public RandomThread(CyclicBarrier barrier,
BlockingQueue<Block> randomQueue,
BlockingQueue<Block> writerQueue,
int max) {
super();
this.barrier = barrier;
this.randomQueue = randomQueue;
this.writerQueue = writerQueue;
this.max = max;
} public void run(){
Block block = null;
Random rand = new Random(System.currentTimeMillis());
try {
while((block=randomQueue.take())!=POISON){
for(int i=0;i<block.rand.length;i++){
block.rand[i] = rand.nextInt(max);
}
writerQueue.put(block);
}
} catch (InterruptedException e) {
e.printStackTrace();
} finally{
try {
barrier.await();
} catch (Exception e) {
e.printStackTrace();
}
}
}
}
static class WriterThread extends Thread{
BlockingQueue<Block> writerQueue;
FileChannel channel;
CyclicBarrier barrier;
public WriterThread(BlockingQueue<Block> writerQueue,
FileChannel channel, CyclicBarrier barrier ) {
super();
this.writerQueue = writerQueue;
this.channel = channel;
this.barrier = barrier;
}
public void run(){
Block block = null;
try {
while((block=writerQueue.take())!=POISON){
MappedByteBuffer buffer = channel.map(MapMode.READ_WRITE, block.index<<2, block.rand.length<<2);
for(int i=0;i<block.rand.length;i++){
buffer.putInt(block.rand[i]);
}
buffer.force();
}
} catch (InterruptedException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} finally{
try {
barrier.await();
} catch (Exception e) {
e.printStackTrace();
}
}
}
}
public static void main(String[] args) throws Exception {
final String fileName="D:\\tmp\\Test_3.data";
final int count = 500000000;//生成随机数的个数 2,147,483,648
final int max = 1000000;//随机数的最大值(不包括最大值)
final int blockSize = 512*1024;//每个任务单元的大小(字节)
final int randomThread = 1;Runtime.getRuntime().availableProcessors();//生成随机数的线程数
final int writerThread = 1;Runtime.getRuntime().availableProcessors();//写入数据的线程数
long start = System.currentTimeMillis();
int maxNumberOfBlock = blockSize/4;//每个任务单元容纳的随机数的最大个数。
BlockingQueue<Block> randomQueue = new ArrayBlockingQueue<Block>(randomThread);
BlockingQueue<Block> writerQueue = new ArrayBlockingQueue<Block>(writerThread);
RandomThread[] randomThreads = new RandomThread[randomThread];
WriterThread[] writerThreads = new WriterThread[writerThread];
CyclicBarrier randomBarrier = new CyclicBarrier(randomThread+1);
for(int i=0;i<randomThread;i++){
randomThreads[i] = new RandomThread(randomBarrier, randomQueue, writerQueue, max);
randomThreads[i].start();
}
RandomAccessFile out = new RandomAccessFile(fileName,"rw");
FileChannel channel = out.getChannel();
CyclicBarrier writerBarrier = new CyclicBarrier(writerThread+1);
for(int i=0;i<writerThread;i++){
writerThreads[i] = new WriterThread(writerQueue, channel, writerBarrier);
writerThreads[i].start();
}
int index = 0;
for(int i=count;i>0;i-=maxNumberOfBlock){
Block block = new Block();
if(i>=maxNumberOfBlock){
block.rand = new int[maxNumberOfBlock];
block.index = index;
index += maxNumberOfBlock;
}else{
block.rand = new int[i];
block.index = index;
index += i;
}
randomQueue.put(block);
}
for(int i=0;i<randomThread;i++){
randomQueue.put(POISON);
}
randomBarrier.await();
for(int i=0;i<writerThread;i++){
writerQueue.put(POISON);
}
writerBarrier.await();
channel.close();
long end = System.currentTimeMillis();
System.out.println(end-start);
}}
int x = (int) (Math.random() * max);
System.out.println(x);
dos.writeInt(x);这里只写了100个
import java.io.FileOutputStream;
import java.nio.ByteBuffer;
import java.nio.channels.FileChannel;
import java.util.Random;
public class Test_4 { public static void main(String[] args) throws Exception {
final String fileName="D:\\tmp\\Test_4.data";
final int count = 1000000000;//生成随机数的个数
final int max = 1000000;//随机数的最大值(不包括最大值)
final int bufferSize = 512*1024;//缓冲区的大小(字节)
long start = System.currentTimeMillis();
Random rand = new Random(System.currentTimeMillis());
ByteBuffer buffer = ByteBuffer.allocateDirect(bufferSize);
FileOutputStream out = new FileOutputStream(fileName);
FileChannel channel = out.getChannel();
for(int i=0;i<count;){
buffer.clear();
for(;buffer.limit()-buffer.position()>4 && i<count;i++){
buffer.putInt(rand.nextInt(max));
}
buffer.flip();
while(buffer.hasRemaining()) {
channel.write(buffer);
}
buffer.rewind();
}
channel.close();
out.close();
long end = System.currentTimeMillis();
System.out.println(end-start);
//58287
}}
BufferedWriter buf=new BufferedWriter(f,1024*512);
int o=0;
while(o<1000000000){
buf.write((int) (Math.random()*1000000));
o++;
}
buf.close();
buf.write((int) (Math.random()*1000000));
write是写入一个byte这里的int被强转成byte了
public static void main(String[] args) throws Exception {
write();
//read();
} static void write() throws Exception {
long startTime = System.currentTimeMillis(); BufferedOutputStream bos = new BufferedOutputStream(new FileOutputStream("c:/test.dat")); Random rad = new Random(); int max = 1000000;
int count = 1000000000;
//int count = 100; while (count-- > 0) {
// int i = rad.nextInt(max);
// bos.write(int2bytes(i));
// System.out.print(i + "\t"); bos.write(int2bytes(rad.nextInt(max)));
} bos.flush();
bos.close(); System.out.println(System.currentTimeMillis() - startTime);
} static void read() throws Exception {
BufferedInputStream bis = new BufferedInputStream(new FileInputStream("c:/test.dat")); byte[] buff = new byte[4]; while (bis.read(buff) != -1) { int i = bytes2int(buff); // System.out.print(i + "\t");
} bis.close();
} static int bytes2int(byte[] b) {
return ((b[0] & 0xff) << 24) + ((b[1] & 0xff) << 16) + ((b[2] & 0xff) << 8) + (b[3] & 0xff);
} static byte[] int2bytes(int i) {
byte[] rs = new byte[4]; rs[0] = (byte) ((i >>> 24) & 0xff);
rs[1] = (byte) ((i >>> 16) & 0xff);
rs[2] = (byte) ((i >>> 8) & 0xff);
rs[3] = (byte) (i & 0xff); return rs;
}
发个修正版的,能跑到39967ms
buf.write((int) (Math.random()*1000000));
这个语句的意思是,向缓冲区里面写入一个字节的数据,虽然参数是整形数据,但是,实际写入的是这个整数的低八位数据。
因为,Java的byte类型数据,是有符号的,而底层数据处理,我们常常是用无符号的数据来运算处理,
所以,为了兼容这个符号问题,就将参数定位整型数据了,
因为,无论数值正负,整型的第八位数据于byte型数据是一致的。
缓存设成8M硬盘一直处于写状态,耗时36109。
缓存设大了硬盘反而一直空闲总耗时更多了。
所以,用它向缓冲区里写入数据,整数就是4字节,byte就是一个字节,其他类型也保留了他们固有的数据格式。
只是,网络通讯的时候,要注意一下字节序的问题。不过,一般不用太在意这个。