自己想了几天,也没写出来,还是来这里问问高人吧。可以给分或者支付宝意思一下,表示感谢。呵呵有一个如下排好序的文件,是数据库导出的csv格式,想进行一些组合统计。在SQL中因为组合太多,出现组合爆炸,所以打算编程实现:交易号,产品代码,金额
TransID1, A, 500
TransID1, D, 100
TransID2, B, 10
TransID2, B, 300
TransID3, A, 50
TransID3, B, 10
TransID4, Z, 50
TransID5, A, 10
TransID5, E, 200已经按交易号和产品代码排序,而且同一个交易内产品代码不会重复,共有几百个交易代码。一共有多少个交易也可以给定,交易代码的数量如果需要也可以给定。我需要计算出来的是(假设共300种产品)一个300x300/2长度的表,也就是每种组合一起出现在同一个交易的金额累计:
组合名,种类1的金额,种类2的金额,共同出现交易次数
A_A 560 560 3 (3个交易有A产品,其金额小记是560)
A_B 550 310 2
A_C 0 0 0
A_D 500 100 1
B_B
B_C
...如果事先不给定交易码,而是从数据中扫描生成(也就是没有结果为0的小计),也可以的。我的思路是分别读入各个TransID(含1-300种产品),对每个交易分别:『列举出组合,然后在一个数组[300][300]矩阵的半个角里寻找到对应的位置累加上』但是一个现实的问题是用bufferreader读的时候很难控制读完一个交易(然后我想把他存入一个数组中缓冲,某交易有3种产品则数组中有3个元素被赋值,再对这3个做组合,组合结果即可用哈希或硬对应放入矩阵里)。如果读过了读到下一个交易,回头很困难,方法也很难控制。如果改从数据库,每次读入一个交易也可以,(好像容易多了)但最好能在csv文件里解决。先多谢了!
TransID1, A, 500
TransID1, D, 100
TransID2, B, 10
TransID2, B, 300
TransID3, A, 50
TransID3, B, 10
TransID4, Z, 50
TransID5, A, 10
TransID5, E, 200已经按交易号和产品代码排序,而且同一个交易内产品代码不会重复,共有几百个交易代码。一共有多少个交易也可以给定,交易代码的数量如果需要也可以给定。我需要计算出来的是(假设共300种产品)一个300x300/2长度的表,也就是每种组合一起出现在同一个交易的金额累计:
组合名,种类1的金额,种类2的金额,共同出现交易次数
A_A 560 560 3 (3个交易有A产品,其金额小记是560)
A_B 550 310 2
A_C 0 0 0
A_D 500 100 1
B_B
B_C
...如果事先不给定交易码,而是从数据中扫描生成(也就是没有结果为0的小计),也可以的。我的思路是分别读入各个TransID(含1-300种产品),对每个交易分别:『列举出组合,然后在一个数组[300][300]矩阵的半个角里寻找到对应的位置累加上』但是一个现实的问题是用bufferreader读的时候很难控制读完一个交易(然后我想把他存入一个数组中缓冲,某交易有3种产品则数组中有3个元素被赋值,再对这3个做组合,组合结果即可用哈希或硬对应放入矩阵里)。如果读过了读到下一个交易,回头很困难,方法也很难控制。如果改从数据库,每次读入一个交易也可以,(好像容易多了)但最好能在csv文件里解决。先多谢了!
TransID2, B, 10
TransID2, B, 300
跟你的假定“而且同一个交易内产品代码不会重复,”是矛盾的!
import java.io.BufferedReader;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;
import java.util.TreeMap;class Item{
private String pid;
private int money;
public Item(String pid,int money){
this.pid=pid;
this.money=money;
}
public String getPid() {
return pid;
}
public void setPid(String pid) {
this.pid = pid;
}
public int getMoney() {
return money;
}
public void setMoney(int money) {
this.money = money;
}
public String toString(){
return "{pid:"+pid+",money:"+money+"}";
}
}
class Trans{
private String tid;
private List<Item> items;
public Trans(String tid){
this.tid=tid;
}
public boolean equals(Object o){
if (! (o instanceof Trans))
return false;
Trans i=(Trans)o;
return tid.equals(i.getTid());
}
public String getTid() {
return tid;
}
public void setTid(String tid) {
this.tid = tid;
}
public List<Item> getItems() {
return items;
}
public void setItems(List<Item> items) {
this.items = items;
}
public String toString(){
return "{tid:"+tid+",items:"+items+"}\n";
}
}
class Result{
private int sum1;
private int sum2;
private List<Trans> trans;
public String toString(){
return sum1+"\t"+sum2+"\t"+trans.size();
}
public int getSum1() {
return sum1;
}
public void setSum1(int sum1) {
this.sum1 = sum1;
}
public int getSum2() {
return sum2;
}
public void setSum2(int sum2) {
this.sum2 = sum2;
}
public List<Trans> getTrans() {
return trans;
}
public void setTrans(List<Trans> trans) {
this.trans = trans;
}
}public class TestTrans {
public List<Trans> getTransFromCvs(String filename) throws Exception{
List<Trans> trans=new ArrayList<Trans>();
BufferedReader reader=new BufferedReader(new FileReader(filename));
String line=null;
line=reader.readLine();//第一行废掉
while (((line=reader.readLine())!=null)){
String temp[]=line.split(",");
Trans tt=new Trans(temp[0].trim());
if (trans.indexOf(tt)>=0){
tt=trans.get(trans.indexOf(tt));
} else {
trans.add(tt);
tt.setItems(new ArrayList<Item>());
}
int m=Integer.parseInt(temp[2].trim());
tt.getItems().add(new Item(temp[1].trim(),m));
}
reader.close();
return trans;
}
public Map<String,Result> processTrans(List<Trans> trans){
Map<String,Result> result=new TreeMap<String,Result>();
for (Trans t:trans){
List<Item> items=t.getItems();
for (int i=0;i<items.size();i++){
Item item=items.get(i);
for (int j=i;j<items.size();j++){
Item item2=items.get(j);
String key=item.getPid()+"-"+item2.getPid();
Result r=result.get(key);
if (r==null){
r=new Result();
r.setTrans(new ArrayList<Trans>());
result.put(key,r);
}
r.setSum1(r.getSum1()+item.getMoney());
r.setSum2(r.getSum2()+item2.getMoney());
r.getTrans().add(t);
}
}
}
return result;
}
/**
* @param args
* @throws Exception
*/
public static void main(String[] args) throws Exception {
TestTrans test=new TestTrans();
List<Trans> trans=test.getTransFromCvs("d:/test.cvs");
System.out.println(trans);
Map<String,Result> result=test.processTrans(trans);
System.out.println("产品组合\t金额1\t金额2\t次数");
for (String key:result.keySet()){
Result r=result.get(key);
System.out.println(key+"\t"+r.getSum1()+"\t"+r.getSum2()+"\t"+r.getTrans().size());
}
}}
at java.lang.String.substring(Unknown Source)
at java.lang.String.subSequence(Unknown Source)
at java.util.regex.Pattern.split(Unknown Source)
at java.lang.String.split(Unknown Source)
at java.lang.String.split(Unknown Source)
at TestTrans.getTransFromCvs(TestTrans.java:98)
at TestTrans.main(TestTrans.java:150)
String temp[]=line.split(","); (line 98)读200兆的csv文件时出内存不够错误?2G内存, 4G交换文件我查查看应该怎么处理,应该是分段读吧?读100万个交易之后,计算一下,清空交易列表,再读再加..
public class TestTrans {
public Map<String,Result> processTransComb(String filename)throws Exception {
//moved in //List<Trans> trans=new ArrayList<Trans>();
BufferedReader reader=new BufferedReader(new FileReader(filename));
String line=null;
line=reader.readLine();
int counter;
boolean noteofile;
boolean inloop;
Map<String,Result> result=new TreeMap<String,Result>();
do{
List<Trans> trans=new ArrayList<Trans>();
inloop=false;
counter=0;
while ((noteofile=((line=reader.readLine())!=null))){
inloop=true;
String temp[]=line.split(",");
Trans tt=new Trans(temp[0].trim());
if (trans.indexOf(tt)>=0){
tt=trans.get(trans.indexOf(tt));
} else {
trans.add(tt);
tt.setItems(new ArrayList<Item>());
}
float m=Float.valueOf(temp[2].trim()).floatValue();
tt.getItems().add(new Item(temp[1].trim(),m));
counter++;
if (counter>1000000)
{//reader.(2);
break;}
}
if (inloop==true){
// {break;}
for (Trans t:trans){
List<Item> items=t.getItems();
for (int i=0;i<items.size();i++){
Item item=items.get(i);
for (int j=i;j<items.size();j++){
Item item2=items.get(j);
String key=item.getPid()+"-"+item2.getPid();
Result r=result.get(key);
if (r==null){
r=new Result();
r.setTrans(new ArrayList<Trans>());
result.put(key,r);
}
r.setSum1(r.getSum1()+item.getMoney());
r.setSum2(r.getSum2()+item2.getMoney());
r.getTrans().add(t);
}
}
}
}//inloop
} while (noteofile==true);
reader.close();
return result;
}
可以运行了,但是截断交易的次数太多。。数据不准了,而且如果我设成每10000行截断一次,运行就慢了很多。。设成行1百万就没法运行了吧,原数据有1千万行
还是出错了。。40兆的文件可以,200兆则内存溢出
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;
import java.util.TreeMap;
import java.io.*;
class Item{
private String pid;
private float money;
public Item(String pid,float money){
this.pid=pid;
this.money=money;
}
public String getPid() {
return pid;
}
public void setPid(String pid) {
this.pid = pid;
}
public float getMoney() {
return money;
}
public void setMoney(int money) {
this.money = money;
}
public String toString(){
return "{pid:"+pid+",money:"+money+"}";
}
}
class Trans{
private String tid;
private List<Item> items;
public Trans(String tid){
this.tid=tid;
}
public boolean equals(Object o){
if (! (o instanceof Trans))
return false;
Trans i=(Trans)o;
return tid.equals(i.getTid());
}
public String getTid() {
return tid;
}
public void setTid(String tid) {
this.tid = tid;
}
public List<Item> getItems() {
return items;
}
public void setItems(List<Item> items) {
this.items = items;
}
public String toString(){
return "{tid:"+tid+",items:"+items+"}\n";
}
}
class Result{
private float sum1;
private float sum2;
private List<Trans> trans;
public String toString(){
return sum1+"\t"+sum2+"\t"+trans.size();
}
public float getSum1() {
return sum1;
}
public void setSum1(float sum1) {
this.sum1 = sum1;
}
public float getSum2() {
return sum2;
}
public void setSum2(float sum2) {
this.sum2 = sum2;
}
public List<Trans> getTrans() {
return trans;
}
public void setTrans(List<Trans> trans) {
this.trans = trans;
}
}public class TestTrans {
public Map<String,Result> processTransComb(String filename)throws Exception {
//moved in //List<Trans> trans=new ArrayList<Trans>();
BufferedReader reader=new BufferedReader(new FileReader(filename));
String line=null;
line=reader.readLine();
int counter;
boolean noteofile;
boolean inloop;
Map<String,Result> result=new TreeMap<String,Result>();
List<Trans> trans=new ArrayList<Trans>();
do{
inloop=false;
counter=0;
while ((noteofile=((line=reader.readLine())!=null))){
inloop=true;
String temp[]=line.split(",");
Trans tt=new Trans(temp[0].trim());
if (trans.indexOf(tt)>=0){
tt=trans.get(trans.indexOf(tt));
} else {
trans.add(tt);
tt.setItems(new ArrayList<Item>());
}
float m=Float.valueOf(temp[2].trim()).floatValue();
tt.getItems().add(new Item(temp[1].trim(),m));
counter++;
if (counter>5000)
{//reader.(2);
break;}
}
if (inloop==true){
// {break;}
for (Trans t:trans){
List<Item> items=t.getItems();
for (int i=0;i<items.size();i++){
Item item=items.get(i);
for (int j=i;j<items.size();j++){
Item item2=items.get(j);
String key=item.getPid()+"-"+item2.getPid();
Result r=result.get(key);
if (r==null){
r=new Result();
r.setTrans(new ArrayList<Trans>());
result.put(key,r);
}
r.setSum1(r.getSum1()+item.getMoney());
r.setSum2(r.getSum2()+item2.getMoney());
r.getTrans().add(t);
}
}
}
trans.clear();
}//inloop
} while (noteofile==true);
reader.close();
return result;
}
public static void main(String[] args) throws Exception {
try {
BufferedWriter out = new BufferedWriter(new FileWriter("Result_MI_43.csv"));
out.write("Pair,Value1,Valu3,Tx_Count");
out.newLine();
TestTrans test=new TestTrans();
// List<Trans> trans=test.getTransFromCvs();
//System.out.println(trans);
Map<String,Result> result=test.processTransComb("G:\\MI_Attachment\\Sample.csv");
//System.out.println("p\t1\t2\t");
for (String key:result.keySet()){
Result r=result.get(key);
out.write(key+","+r.getSum1()+","+r.getSum2()+","+r.getTrans().size());
out.newLine();
}
out.close();
} catch (IOException e) {
}
}
}
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;
import java.util.TreeMap;
import java.io.*;class Item {
private String pid;
private float money; public Item(String pid, float money) {
this.pid = pid;
this.money = money;
} public String getPid() {
return pid;
} public void setPid(String pid) {
this.pid = pid;
} public float getMoney() {
return money;
} public void setMoney(int money) {
this.money = money;
} public String toString() {
return "{pid:" + pid + ",money:" + money + "}";
}
}class Trans {
private String tid;
private List<Item> items; public Trans(String tid) {
this.tid = tid;
} public boolean equals(Object o) {
if (!(o instanceof Trans))
return false;
Trans i = (Trans) o;
return tid.equals(i.getTid());
} public String getTid() {
return tid;
} public void setTid(String tid) {
this.tid = tid;
} public List<Item> getItems() {
return items;
} public void setItems(List<Item> items) {
this.items = items;
} public String toString() {
return "{tid:" + tid + ",items:" + items + "}\n";
}
}class Result {
private float sum1;
private float sum2;
private int count; public int getCount() {
return count;
} public void setCount(int count) {
this.count = count;
} public String toString() {
return sum1 + "\t" + sum2 + "\t" + count;
} public float getSum1() {
return sum1;
} public void setSum1(float sum1) {
this.sum1 = sum1;
} public float getSum2() {
return sum2;
} public void setSum2(float sum2) {
this.sum2 = sum2;
}}public class TestTrans {
private void processOneTrans(Map<String, Result> result,Trans trans){
List<Item> items = trans.getItems();
for (int i = 0; i < items.size(); i++) {
Item item = items.get(i);
for (int j = i; j < items.size(); j++) {
Item item2 = items.get(j);
String key = item.getPid() + "-" + item2.getPid();
Result r = result.get(key);
if (r == null) {
r = new Result();
r.setCount(0);
result.put(key, r);
}
r.setSum1(r.getSum1() + item.getMoney());
r.setSum2(r.getSum2() + item2.getMoney());
r.setCount(r.getCount()+1);
}
}
}
public Map<String, Result> processTransComb(String filename)
throws Exception { BufferedReader reader = new BufferedReader(new FileReader(filename));
String line = null;
line = reader.readLine();
String oldTransId="";
Trans tt=null;
Map<String, Result> result=new TreeMap<String, Result>();
int cnt=0;
while ((line=reader.readLine())!=null){
cnt++;
if (cnt%10000==0){
System.out.println(cnt+"行处理完成...");
}
String temp[] = line.split(",");
if (temp[0].equals(oldTransId)){
float m = Float.valueOf(temp[2].trim()).floatValue();
tt.getItems().add(new Item(temp[1].trim(), m));
} else {
oldTransId=temp[0];
if (tt!=null){
processOneTrans(result,tt);
}
tt=new Trans(temp[0]);
tt.setItems(new ArrayList<Item>());
float m = Float.valueOf(temp[2].trim()).floatValue();
tt.getItems().add(new Item(temp[1].trim(), m));
}
}
reader.close(); return result; } public static void main(String[] args) throws Exception { try {
BufferedWriter out = new BufferedWriter(new FileWriter(
"I:\\temp\\Sample3\\Result_MI_43.csv")); out.write("Pair,Value1,Valu3,Tx_Count");
out.newLine(); TestTrans test = new TestTrans();
// List <Trans> trans=test.getTransFromCvs();
// System.out.println(trans);
Map<String, Result> result = test
.processTransComb("I:\\temp\\Sample3\\Sample3.csv");
// System.out.println("p\t1\t2\t"); for (String key : result.keySet()) {
Result r = result.get(key);
out.write(key + "," + r.getSum1() + "," + r.getSum2() + ","
+ r.getCount());
out.newLine();
} out.close();
} catch (IOException e) {
} }}