listbox1中有300000条数据长度是15,另一个listbox2中有1000条记录,长度为15
要求在5~25分钟内把listbox1中的每条数据与listbox2中的每条数据对比,看看对应位置的字符是否相同,有几个相同!,算法我已经作出来了,不过速度慢,请大家帮忙看看。
我循环里的数据量可能是300000×15×1000=4500000000(单个字符进行比较)
要求在5~25分钟内把listbox1中的每条数据与listbox2中的每条数据对比,看看对应位置的字符是否相同,有几个相同!,算法我已经作出来了,不过速度慢,请大家帮忙看看。
我循环里的数据量可能是300000×15×1000=4500000000(单个字符进行比较)
listbox1.indexof(listbox2.items[i])=-1 ;//no exist
...
begin
for Result := 0 to GetCount - 1 do
if CompareStrings(Get(Result), S) = 0 then Exit;
Result := -1;
end;
比如
j:integer;
for i:=1 to 15 do
if aString[i]=s[i] then
j:=j+1;
只对比里面的数据
比如:
listbox1: listbox2
123456852345895 121456654123565
要比较listbox1中的记录的第一个字符和listbox2中的第一个字符
listbox1中的记录的第二个字符和listbox2中的第二个字符
listbox1中的记录的第三个字符和listbox2中的第三个字符
...
假如第一个相同: j=1
第二个相同 j:=2
3~5没有相同的
第六个相同 j:=3
...
要listbox1中的第一条记录分别和listbox2中的所有记录进行比较
for j:= 0 to listbox1.Count-1 do
for i:=0 to 1000 do
begin
for k:=1 to 13 do
begin
if aString[k]=Posstring[k] then
begin
j:=j+1;
end;
end
end;
aString:当前listbox1中的数据 Posstring:当前listbox2中的数据
把内存中的字串当作数值处理,内嵌汇编进行循环和比较
>>注意,我要比较每条记录中相同字符的个数:
>>比如:但我比較排序很重要, 先記錄, 去掉重復的記錄,
然後 , 如果
ListBox1 第一條 match listbox2 第三條記錄,
那
listbox1 第二條 應該直接與 listbox2 的第四條 match 了!象字符串查找的算法的原理也差不多!
第一反應是這樣做, 有什麼好想法, 再討論!
感覺是太大了!然後, 考慮另一種, 放到數據庫, 加索引,
用SQL查詢優化出來的的效果, 又不知如何??
但我比較排序很重要, 先記錄, 去掉重復的記錄,
然後 , 如果
ListBox1 第一條 match listbox2 第三條記錄,
那
listbox1 第二條 應該直接與 listbox2 的第四條 match 了!象字符串查找的算法的原理也差不多!
///////////////////////////////////////////////
ListBox1 第一條 match “listbox2 第三條記錄,”(listbox2所有的记录)
那
listbox1 第二條 應該直接與 ”listbox2 的第四條 match 了!“(listbox2所有的记录)
另外感觉你使用 P4的SSE 应该有些优势,他是128bit的,虽然浮点才是她的强项。
ListBox2.Lines.SaveToFilereal-like#163.com把#换成@,防止那些垃圾邮件收集邮件列表。
恩,差不多,你怎么做的?(如果不放入数据库的话)
1、首先我们根据不同产品,可以直接筛选出2亿里面的大约几十分之一的;
2、把20位编码分成4组,每组5个码,放入4列,分别建立索引,(所以建了整整一天才完成)
3、假定每个编码中4组中至少有一组完全正确(因为实际工作中条件一定满足)匹配的时候把需要匹配的编码认为分成4组,然后以每组为索引分别匹配,这样复杂度就变成了 logn/log2 了
之后匹配一次的速度只有不到一分钟。你感觉和你的有类似不?
if ODialog.Execute then
begin
AssignFile(txtFile,ODialog.FileName);
Reset(txtFile);
while not Eof(txtFile) do
begin
判断???
end;
ListBox2.Lines.SaveToFilereal-like#163.com把#换成@,防止那些垃圾邮件收集邮件列表。-------------------------------------------我也要, akunspy#sina.com
多开几个线程,把listbox1的数据分面若干段,每段数据放在一个线程里面和listbox2比较,时间应该会短些-------------------------------------------------------------------------------
呵呵,老兄,CPU可不能并发处理线程.我想这样也不能提高速度吧
每个位置上的字符只可能是 2^16个(按unincode编码)
没有必要将listbox1的300000 与listbox2 的1000 记录逐条比较
先统计出listbox1和listbox2中个各字符的个数 形成两个数组
这样需比较的记录的条数最多也就2^16个
这样操作15次
这样最复杂的情况下比较和遍历的次数是
(300000 + 1000 )* 2^16 * 2 *15
另一个listbox2中有1000条记录,长度为15
预处理 成 一个 26 * 15 行字符的对应表,列记载对应的位置,然后用个 for 循环
将
listbox1中有300000 的每个字符对应查表,如何??
123456852345895 121456654123565
要比较listbox1中的记录的第一个字符和listbox2中的第一个字符
listbox1中的记录的第二个字符和listbox2中的第二个字符
listbox1中的记录的第三个字符和listbox2中的第三个字符
...
假如第一个相同: j=1
第二个相同 j:=2
3~5没有相同的
第六个相同 j:=3输入两个字符串,得到对应位置相同的个数
for i:=0 to listbox2.Count-1 do
begin
application.ProcessMessages;
label2.Caption :=inttostr(Contrast(strlist2[j] ,strlist1[i]));
end;
function Contrast(aString:string;Posstring:string): integer;
var
rowCount,i,j,subStrLen:integer;
m:integer;
begin
j:=0;
for i:=1 to 13 do
begin
if aString[i]=Posstring[i] then
begin
j:=j+1; end ;
end;
result:=j;end;
listbox2.count=1000
application.ProcessMessages;
label2.Caption :=inttostr(Contrast(strlist2[j] ,strlist1[i]));
///////就是让计算机拼命显示跳动的数字!~~还不如改成
application.ProcessMessages;
label2.Caption := Random(15);你的目的要说清楚,说不清楚谁能帮你?~~
function TForm1.compstr(str1,str2:string):Integer;
var
i,j:Integer;
begin
j:=0;
for i:=0 to 14 do
begin
if Str1[i]=str2[i] then
inc(j);
end;
Result:=j;
end;procedure TForm1.Button2Click(Sender: TObject);
var
i,j:Integer;
str1:String;
begin
for j:= 0 to 999 do
begin
str1:=ListBox2.Items[j];
for i:=0 to num do
begin
compstr(str1,ListBox1.Items[i]);
end;
Label1.Caption:=inttostr(j+1);
application.ProcessMessages;
end;
end;注: num:=299999;你的
application.ProcessMessages;
label2.Caption :=inttostr(Contrast(strlist2[j] ,strlist1[i]));
这两句太花时间了,不要显示出来,快多了.
我是也只显示了1000次.而你的却显示了300000×1000次,不慢才怪.
Windows, Messages, SysUtils, Variants, Classes, Graphics, Controls, Forms,
Dialogs, StdCtrls;type
PLine = ^TLine;
TLine = array [0..14] of byte; //包括了0A, 0D换行符.不处理
TForm1 = class(TForm)
Button1: TButton;
Button2: TButton;
procedure Button1Click(Sender: TObject);
procedure Button2Click(Sender: TObject);
private
{ Private declarations }
public
{ Public declarations }
end;var
Form1: TForm1;
Data1, Data2: array of TLine;implementation{$R *.dfm}procedure TForm1.Button1Click(Sender: TObject);
var
DataFile: File of TLine;
FileLen, i: integer;
begin
AssignFile(DataFile, 'd:\1.txt');
ReSet(DataFile);
FileLen := FileSize(DataFile);
SetLength(Data1, FileLen);
for i := 0 to FileLen - 1 do
BlockRead(DataFile, Data1[i], 1);
CloseFile(DataFile);
AssignFile(DataFile, 'd:\2.txt');
ReSet(DataFile);
FileLen := FileSize(DataFile);
SetLength(Data2, FileLen);
for i := 0 to FileLen - 1 do
BlockRead(DataFile, Data2[i], 1);
CloseFile(DataFile);
end;procedure TForm1.Button2Click(Sender: TObject);
var
Data1Len, Data2Len, n, i, j, k: integer;
begin
Data1Len := Length(Data1);
Data2Len := Length(Data2);
for i := 0 to Data1Len - 1 do
for j := 0 to Data2Len - 1 do
begin
n := 0;
for k := 0 to 12 do
if Data1[i][k] = Data2[j][k] then
inc(n);
//这里N就是相同的数据
end;
end;end.
看看……
我也是用的文件 就不用再废话了(两个memo直接比较我估算了一下要4个多小时)
分还是要抢的 :)
http://community.csdn.net/Expert/topic/2796/2796024.xml?temp=.4200861楼主能给我发一个你的测试数据文件吗?我好试试看;
现在我是在1000中找,所以速度很慢,代码为如下形式for 1 to 1000
for 1 to 300000
..............
速度花了 358秒左右,大概是5、6分钟的样子;现在另外一个是如下形式for 1 to 300000
for 1 to 1000速度大概为 3.2251333333333333333333333333333分钟实现的原理是采用搜索定位的方式,行长度为15 + 回车换行字符一共17位源码部分一
const
cDeltaSize = 1.5;
var
Form1 : TForm1;
GUpcaseTable : array[0..255] of char;
GUpcaseLUT : Pointer;
implementation{$R *.dfm}function FastmemPos(const aSource, aFind; const aSourceLen, aFindLen: integer):
Pointer;
asm
push ESI
push EDI
push EBX mov ESI, aFind
mov EDI, aSource
mov ECX, aSourceLen mov Result, 0
cmp ECX, aFindLen
jl @TheEnd
cmp aFindLen, 1
jl @TheEnd sub ECX, aFindLen
inc ECX Mov Al, [ESI]
jmp @Scasb
@FindNext:
inc EDI
dec ECX
jz @NotFound @ScaSB: cmp [EDI], al jz @CompareStrings
inc EDI
dec ECX
jnz @ScaSB
jmp @NotFound @CompareStrings: mov EBX, aFindLen @CompareNext: dec EBX
jz @FullMatch mov Ah, [ESI+EBX] cmp Ah, [EDI+EBX]
Jnz @FindNext Jmp @CompareNext @FullMatch:
mov Result, EDI
jmp @TheEnd
@NotFound:
mov Result, 0 @TheEnd:
pop EBX
pop EDI
pop ESI
end;function FastmemPosNC(const aSource, aFind; const aSourceLen, aFindLen:
integer): Pointer;
asm
push ESI
push EDI
push EBX mov ESI, aFind
mov EDI, aSource
mov ECX, aSourceLen mov Result, 0 cmp ECX, aFindLen
jl @TheEnd cmp aFindLen, 1
jl @TheEnd sub ECX, aFindLen
inc ECX mov EDX, GUpcaseLUT
xor EBX, EBX
jmp @FindFirst
@FindNext:
inc EDI
dec ECX
jz @NotFound
@FindFirst:
mov Bl, [ESI]
mov AL, [EDX+EBX] @ScaSB:
mov Bl, [EDI]
cmp Al, [EDX+EBX] jz @CompareStrings
inc EDI
dec ECX
jnz @ScaSB
jmp @NotFound @CompareStrings:
push ECX
mov ECX, aFindLen @CompareNext:
dec ECX
jz @FullMatch mov Bl, [ESI+ECX]
mov Al, [EDX+EBX] mov Bl, [EDI+ECX]
cmp Al, [EDX+EBX]
jz @KeepChecking POP ECX
jmp @FindNext @KeepChecking:
Jmp @CompareNext @FullMatch:
pop ECX
mov Result, EDI
jmp @TheEnd @NotFound:
mov Result, 0 @TheEnd:
pop EBX
pop EDI
pop ESI
end;function FastPosNoCase(const aSourceString, aFindString: string; const
aSourceLen, aFindLen, StartPos: Integer): Integer;
begin
Assert(StartPos > 0);
Result := Integer(FastmemPosNC(aSourceString[StartPos], aFindString[1],
aSourceLen - (StartPos - 1), aFindLen));
if Result > 0 then
Result := Result - Integer(@aSourceString[1]) + 1;
end;procedure FastSearch(const aSourceString: string; const aFindString: string;
CaseSensitive: Boolean = False);
var
P : Integer;
Pl : integer; //所在行
begin
P := FastPosNoCase(aSourceString, aFindString, Length(aSourceString),
lengTh(aFindString), 1);
if P > 0 then
repeat
if P <> 0 then
begin
Pl := p div 17; //行宽15 + x0dx0a = 17
if (p mod 17) <> 0 then
inc(Pl, 1);
Form1.ListBox3.Items.Add(format('第 %d 行重复', [pl]));
end;
P := FastPosNoCase(aSourceString, aFindString, Length(aSourceString),
lengTh(aFindString), P + Length(aFindString));
until P = 0;
end;
var
aFindString : string;
f : TStrings;
ftime : DWORD;
n, I : integer;
begin
// f := ListBox1.Items; //这个是listbox1
f := TstringList.Create;
f.LoadFromFile('c:\l1.txt'); //装入 300000
listbox2.Items.LoadFromFile('c:\l2.txt'); //装入1000
listbox3.Hide;
listBox3.Items.BeginUpdate;
fTime := GetTickCount;
n := -1;
Gauge1.MaxValue := 99;
Gauge1.MinValue := 0;
Gauge1.Progress := 0;
for i := 0 to 99 do //循环中10次搜索,1000=0..999 /10 = 99 次
begin
Gauge1.Progress := Gauge1.Progress + 1;
inc(n, 1);
aFindString := listbox2.Items.Strings[n];
FastSearch(f.Text, aFindString, FALSE);
inc(n, 1);
aFindString := listbox2.Items.Strings[n];
FastSearch(f.Text, aFindString, FALSE);
inc(n, 1);
aFindString := listbox2.Items.Strings[n];
FastSearch(f.Text, aFindString, FALSE);
inc(n, 1);
aFindString := listbox2.Items.Strings[n];
FastSearch(f.Text, aFindString, FALSE);
inc(n, 1);
aFindString := listbox2.Items.Strings[n];
FastSearch(f.Text, aFindString, FALSE);
inc(n, 1);
aFindString := listbox2.Items.Strings[n];
FastSearch(f.Text, aFindString, FALSE);
inc(n, 1);
aFindString := listbox2.Items.Strings[n];
FastSearch(f.Text, aFindString, FALSE);
inc(n, 1);
aFindString := listbox2.Items.Strings[n];
FastSearch(f.Text, aFindString, FALSE);
inc(n, 1);
aFindString := listbox2.Items.Strings[n];
FastSearch(f.Text, aFindString, FALSE);
inc(n, 1);
aFindString := listbox2.Items.Strings[n];
FastSearch(f.Text, aFindString, FALSE);
end;
Gauge1.Progress := Gauge1.MaxValue;
fTime := GetTickCount - fTime;
Label4.Caption := '时间:' + (formatFloat('#0', fTime)); //计算时间,是DWORD
listBox3.Items.EndUpdate;
listbox3.Show;
f.free;
end;procedure TForm1.Button5Click(Sender: TObject);
var
aFindString : string;
f1,f2 : TStrings;
ftime : DWORD;
n, I : integer;
begin
// f := ListBox1.Items; //这个是listbox1
f1 := TstringList.Create;
f2 := tstringlist.Create; f1.LoadFromFile('c:\l1.txt'); //装入300000
f2.LoadFromFile('c:\l2.txt'); //装入1000 listbox3.Hide;
listBox3.Items.BeginUpdate;
fTime := GetTickCount;
n := -1;
Gauge1.MaxValue := 30000;
Gauge1.MinValue := 0;
Gauge1.Progress := 0;
for i := 0 to 30000 -1 do //30000 - 1=300000 / 10 (循环中10次定位)
begin
Gauge1.Progress := Gauge1.Progress + 1;
inc(n, 1);
aFindString := f1.Strings[n];
FastSearch(f2.Text, aFindString, FALSE);
inc(n, 1);
aFindString := f1.Strings[n];
FastSearch(f2.Text, aFindString, FALSE);
inc(n, 1);
aFindString := f1.Strings[n];
FastSearch(f2.Text, aFindString, FALSE);
inc(n, 1);
aFindString := f1.Strings[n];
FastSearch(f2.Text, aFindString, FALSE);
inc(n, 1);
aFindString := f1.Strings[n];
FastSearch(f2.Text, aFindString, FALSE);
inc(n, 1);
aFindString := f1.Strings[n];
FastSearch(f2.Text, aFindString, FALSE);
inc(n, 1);
aFindString := f1.Strings[n];
FastSearch(f2.Text, aFindString, FALSE);
inc(n, 1);
aFindString := f1.Strings[n];
FastSearch(f2.Text, aFindString, FALSE);
inc(n, 1);
aFindString := f1.Strings[n];
FastSearch(f2.Text, aFindString, FALSE);
inc(n, 1);
aFindString := f1.Strings[n];
FastSearch(f2.Text, aFindString, FALSE);
end; Gauge1.Progress := Gauge1.MaxValue;
fTime := GetTickCount - fTime;
Label4.Caption := '时间:' + (formatFloat('#0', fTime)); //计算时间,是DWORD
listBox3.Items.EndUpdate;
listbox3.Show;
f1.free;
f2.free;end;var //初始化,不能缺少
I : Integer;initialization
for I := 0 to 255 do
GUpcaseTable[I] := Chr(I);
CharUpperBuff(@GUpcaseTable[0], 256);
GUpcaseLUT := @GUpcaseTable[0];
end.
var
Data1Len, Data2Len, n, i, j, k: integer;
l1, l2 : TStrings;
fTime : DWORD;
begin
l1 := TstringList.Create;
l2 := TstringList.Create;
l1.LoadFromFile('c:\l1.txt');
l2.LoadFromFile('c:\l2.txt');
Data1Len := 30000;
Data2Len := 100;
Gauge1.MaxValue := 30000;
Gauge1.MinValue := 0;
Gauge1.Progress := 0;
listbox3.Items.BeginUpdate;
fTime := GetTickCount; n := -1;
for i := 0 to Data1Len - 1 do
begin
Gauge1.Progress := i;
inc(n, 1);
k := -1;
for j := 0 to Data2Len - 1 do
begin
inc(k, 1);
if l1.Strings[n] = l2.Strings[k] then
listbox3.Items.Add(inttostr(n) + '=' + inttostr(k));
inc(k, 1);
if l1.Strings[n] = l2.Strings[k] then
listbox3.Items.Add(inttostr(n) + '=' + inttostr(k));
inc(k, 1);
if l1.Strings[n] = l2.Strings[k] then
listbox3.Items.Add(inttostr(n) + '=' + inttostr(k));
inc(k, 1);
if l1.Strings[n] = l2.Strings[k] then
listbox3.Items.Add(inttostr(n) + '=' + inttostr(k));
inc(k, 1);
if l1.Strings[n] = l2.Strings[k] then
listbox3.Items.Add(inttostr(n) + '=' + inttostr(k));
inc(k, 1);
if l1.Strings[n] = l2.Strings[k] then
listbox3.Items.Add(inttostr(n) + '=' + inttostr(k));
inc(k, 1);
if l1.Strings[n] = l2.Strings[k] then
listbox3.Items.Add(inttostr(n) + '=' + inttostr(k));
inc(k, 1);
if l1.Strings[n] = l2.Strings[k] then
listbox3.Items.Add(inttostr(n) + '=' + inttostr(k));
inc(k, 1);
if l1.Strings[n] = l2.Strings[k] then
listbox3.Items.Add(inttostr(n) + '=' + inttostr(k));
inc(k, 1);
if l1.Strings[n] = l2.Strings[k] then
listbox3.Items.Add(inttostr(n) + '=' + inttostr(k));
end;
inc(n, 1);
k := -1;
for j := 0 to Data2Len - 1 do
begin
inc(k, 1);
if l1.Strings[n] = l2.Strings[k] then
listbox3.Items.Add(inttostr(n) + '=' + inttostr(k));
inc(k, 1);
if l1.Strings[n] = l2.Strings[k] then
listbox3.Items.Add(inttostr(n) + '=' + inttostr(k));
inc(k, 1);
if l1.Strings[n] = l2.Strings[k] then
listbox3.Items.Add(inttostr(n) + '=' + inttostr(k));
inc(k, 1);
if l1.Strings[n] = l2.Strings[k] then
listbox3.Items.Add(inttostr(n) + '=' + inttostr(k));
inc(k, 1);
if l1.Strings[n] = l2.Strings[k] then
listbox3.Items.Add(inttostr(n) + '=' + inttostr(k));
inc(k, 1);
if l1.Strings[n] = l2.Strings[k] then
listbox3.Items.Add(inttostr(n) + '=' + inttostr(k));
inc(k, 1);
if l1.Strings[n] = l2.Strings[k] then
listbox3.Items.Add(inttostr(n) + '=' + inttostr(k));
inc(k, 1);
if l1.Strings[n] = l2.Strings[k] then
listbox3.Items.Add(inttostr(n) + '=' + inttostr(k));
inc(k, 1);
if l1.Strings[n] = l2.Strings[k] then
listbox3.Items.Add(inttostr(n) + '=' + inttostr(k));
inc(k, 1);
if l1.Strings[n] = l2.Strings[k] then
listbox3.Items.Add(inttostr(n) + '=' + inttostr(k));
end;
inc(n, 1);
k := -1;
for j := 0 to Data2Len - 1 do
begin
inc(k, 1);
if l1.Strings[n] = l2.Strings[k] then
listbox3.Items.Add(inttostr(n) + '=' + inttostr(k));
inc(k, 1);
if l1.Strings[n] = l2.Strings[k] then
listbox3.Items.Add(inttostr(n) + '=' + inttostr(k));
inc(k, 1);
if l1.Strings[n] = l2.Strings[k] then
listbox3.Items.Add(inttostr(n) + '=' + inttostr(k));
inc(k, 1);
if l1.Strings[n] = l2.Strings[k] then
listbox3.Items.Add(inttostr(n) + '=' + inttostr(k));
inc(k, 1);
if l1.Strings[n] = l2.Strings[k] then
listbox3.Items.Add(inttostr(n) + '=' + inttostr(k));
inc(k, 1);
if l1.Strings[n] = l2.Strings[k] then
listbox3.Items.Add(inttostr(n) + '=' + inttostr(k));
inc(k, 1);
if l1.Strings[n] = l2.Strings[k] then
listbox3.Items.Add(inttostr(n) + '=' + inttostr(k));
inc(k, 1);
if l1.Strings[n] = l2.Strings[k] then
listbox3.Items.Add(inttostr(n) + '=' + inttostr(k));
inc(k, 1);
if l1.Strings[n] = l2.Strings[k] then
listbox3.Items.Add(inttostr(n) + '=' + inttostr(k));
inc(k, 1);
if l1.Strings[n] = l2.Strings[k] then
listbox3.Items.Add(inttostr(n) + '=' + inttostr(k));
end;
inc(n, 1);
k := -1;
for j := 0 to Data2Len - 1 do
begin
inc(k, 1);
if l1.Strings[n] = l2.Strings[k] then
listbox3.Items.Add(inttostr(n) + '=' + inttostr(k));
inc(k, 1);
if l1.Strings[n] = l2.Strings[k] then
listbox3.Items.Add(inttostr(n) + '=' + inttostr(k));
inc(k, 1);
if l1.Strings[n] = l2.Strings[k] then
listbox3.Items.Add(inttostr(n) + '=' + inttostr(k));
inc(k, 1);
if l1.Strings[n] = l2.Strings[k] then
listbox3.Items.Add(inttostr(n) + '=' + inttostr(k));
inc(k, 1);
if l1.Strings[n] = l2.Strings[k] then
listbox3.Items.Add(inttostr(n) + '=' + inttostr(k));
inc(k, 1);
if l1.Strings[n] = l2.Strings[k] then
listbox3.Items.Add(inttostr(n) + '=' + inttostr(k));
inc(k, 1);
if l1.Strings[n] = l2.Strings[k] then
listbox3.Items.Add(inttostr(n) + '=' + inttostr(k));
inc(k, 1);
if l1.Strings[n] = l2.Strings[k] then
listbox3.Items.Add(inttostr(n) + '=' + inttostr(k));
inc(k, 1);
if l1.Strings[n] = l2.Strings[k] then
listbox3.Items.Add(inttostr(n) + '=' + inttostr(k));
inc(k, 1);
if l1.Strings[n] = l2.Strings[k] then
listbox3.Items.Add(inttostr(n) + '=' + inttostr(k));
end;
inc(n, 1);
k := -1;
for j := 0 to Data2Len - 1 do
begin
inc(k, 1);
if l1.Strings[n] = l2.Strings[k] then
listbox3.Items.Add(inttostr(n) + '=' + inttostr(k));
inc(k, 1);
if l1.Strings[n] = l2.Strings[k] then
listbox3.Items.Add(inttostr(n) + '=' + inttostr(k));
inc(k, 1);
if l1.Strings[n] = l2.Strings[k] then
listbox3.Items.Add(inttostr(n) + '=' + inttostr(k));
inc(k, 1);
if l1.Strings[n] = l2.Strings[k] then
listbox3.Items.Add(inttostr(n) + '=' + inttostr(k));
inc(k, 1);
if l1.Strings[n] = l2.Strings[k] then
listbox3.Items.Add(inttostr(n) + '=' + inttostr(k));
inc(k, 1);
if l1.Strings[n] = l2.Strings[k] then
listbox3.Items.Add(inttostr(n) + '=' + inttostr(k));
inc(k, 1);
if l1.Strings[n] = l2.Strings[k] then
listbox3.Items.Add(inttostr(n) + '=' + inttostr(k));
inc(k, 1);
if l1.Strings[n] = l2.Strings[k] then
listbox3.Items.Add(inttostr(n) + '=' + inttostr(k));
inc(k, 1);
if l1.Strings[n] = l2.Strings[k] then
listbox3.Items.Add(inttostr(n) + '=' + inttostr(k));
inc(k, 1);
if l1.Strings[n] = l2.Strings[k] then
listbox3.Items.Add(inttostr(n) + '=' + inttostr(k));
end;
inc(n, 1);
k := -1;
begin
inc(k, 1);
if l1.Strings[n] = l2.Strings[k] then
listbox3.Items.Add(inttostr(n) + '=' + inttostr(k));
inc(k, 1);
if l1.Strings[n] = l2.Strings[k] then
listbox3.Items.Add(inttostr(n) + '=' + inttostr(k));
inc(k, 1);
if l1.Strings[n] = l2.Strings[k] then
listbox3.Items.Add(inttostr(n) + '=' + inttostr(k));
inc(k, 1);
if l1.Strings[n] = l2.Strings[k] then
listbox3.Items.Add(inttostr(n) + '=' + inttostr(k));
inc(k, 1);
if l1.Strings[n] = l2.Strings[k] then
listbox3.Items.Add(inttostr(n) + '=' + inttostr(k));
inc(k, 1);
if l1.Strings[n] = l2.Strings[k] then
listbox3.Items.Add(inttostr(n) + '=' + inttostr(k));
inc(k, 1);
if l1.Strings[n] = l2.Strings[k] then
listbox3.Items.Add(inttostr(n) + '=' + inttostr(k));
inc(k, 1);
if l1.Strings[n] = l2.Strings[k] then
listbox3.Items.Add(inttostr(n) + '=' + inttostr(k));
inc(k, 1);
if l1.Strings[n] = l2.Strings[k] then
listbox3.Items.Add(inttostr(n) + '=' + inttostr(k));
inc(k, 1);
if l1.Strings[n] = l2.Strings[k] then
listbox3.Items.Add(inttostr(n) + '=' + inttostr(k));
end;
inc(n, 1);
k := -1;
for j := 0 to Data2Len - 1 do
begin
inc(k, 1);
if l1.Strings[n] = l2.Strings[k] then
listbox3.Items.Add(inttostr(n) + '=' + inttostr(k));
inc(k, 1);
if l1.Strings[n] = l2.Strings[k] then
listbox3.Items.Add(inttostr(n) + '=' + inttostr(k));
inc(k, 1);
if l1.Strings[n] = l2.Strings[k] then
listbox3.Items.Add(inttostr(n) + '=' + inttostr(k));
inc(k, 1);
if l1.Strings[n] = l2.Strings[k] then
listbox3.Items.Add(inttostr(n) + '=' + inttostr(k));
inc(k, 1);
if l1.Strings[n] = l2.Strings[k] then
listbox3.Items.Add(inttostr(n) + '=' + inttostr(k));
inc(k, 1);
if l1.Strings[n] = l2.Strings[k] then
listbox3.Items.Add(inttostr(n) + '=' + inttostr(k));
inc(k, 1);
if l1.Strings[n] = l2.Strings[k] then
listbox3.Items.Add(inttostr(n) + '=' + inttostr(k));
inc(k, 1);
if l1.Strings[n] = l2.Strings[k] then
listbox3.Items.Add(inttostr(n) + '=' + inttostr(k));
inc(k, 1);
if l1.Strings[n] = l2.Strings[k] then
listbox3.Items.Add(inttostr(n) + '=' + inttostr(k));
inc(k, 1);
if l1.Strings[n] = l2.Strings[k] then
listbox3.Items.Add(inttostr(n) + '=' + inttostr(k));
end;
inc(n, 1);
k := -1;
for j := 0 to Data2Len - 1 do
begin
inc(k, 1);
if l1.Strings[n] = l2.Strings[k] then
listbox3.Items.Add(inttostr(n) + '=' + inttostr(k));
inc(k, 1);
if l1.Strings[n] = l2.Strings[k] then
listbox3.Items.Add(inttostr(n) + '=' + inttostr(k));
inc(k, 1);
if l1.Strings[n] = l2.Strings[k] then
listbox3.Items.Add(inttostr(n) + '=' + inttostr(k));
inc(k, 1);
if l1.Strings[n] = l2.Strings[k] then
listbox3.Items.Add(inttostr(n) + '=' + inttostr(k));
inc(k, 1);
if l1.Strings[n] = l2.Strings[k] then
listbox3.Items.Add(inttostr(n) + '=' + inttostr(k));
inc(k, 1);
if l1.Strings[n] = l2.Strings[k] then
listbox3.Items.Add(inttostr(n) + '=' + inttostr(k));
inc(k, 1);
if l1.Strings[n] = l2.Strings[k] then
listbox3.Items.Add(inttostr(n) + '=' + inttostr(k));
inc(k, 1);
if l1.Strings[n] = l2.Strings[k] then
listbox3.Items.Add(inttostr(n) + '=' + inttostr(k));
inc(k, 1);
if l1.Strings[n] = l2.Strings[k] then
listbox3.Items.Add(inttostr(n) + '=' + inttostr(k));
inc(k, 1);
if l1.Strings[n] = l2.Strings[k] then
listbox3.Items.Add(inttostr(n) + '=' + inttostr(k));
end;
inc(n, 1);
k := -1;
for j := 0 to Data2Len - 1 do
begin
inc(k, 1);
if l1.Strings[n] = l2.Strings[k] then
listbox3.Items.Add(inttostr(n) + '=' + inttostr(k));
inc(k, 1);
if l1.Strings[n] = l2.Strings[k] then
listbox3.Items.Add(inttostr(n) + '=' + inttostr(k));
inc(k, 1);
if l1.Strings[n] = l2.Strings[k] then
listbox3.Items.Add(inttostr(n) + '=' + inttostr(k));
inc(k, 1);
if l1.Strings[n] = l2.Strings[k] then
listbox3.Items.Add(inttostr(n) + '=' + inttostr(k));
inc(k, 1);
if l1.Strings[n] = l2.Strings[k] then
listbox3.Items.Add(inttostr(n) + '=' + inttostr(k));
inc(k, 1);
if l1.Strings[n] = l2.Strings[k] then
listbox3.Items.Add(inttostr(n) + '=' + inttostr(k));
inc(k, 1);
if l1.Strings[n] = l2.Strings[k] then
listbox3.Items.Add(inttostr(n) + '=' + inttostr(k));
inc(k, 1);
if l1.Strings[n] = l2.Strings[k] then
listbox3.Items.Add(inttostr(n) + '=' + inttostr(k));
inc(k, 1);
if l1.Strings[n] = l2.Strings[k] then
listbox3.Items.Add(inttostr(n) + '=' + inttostr(k));
inc(k, 1);
if l1.Strings[n] = l2.Strings[k] then
listbox3.Items.Add(inttostr(n) + '=' + inttostr(k));
end;
inc(n, 1);
k := -1;
for j := 0 to Data2Len - 1 do
begin
inc(k, 1);
if l1.Strings[n] = l2.Strings[k] then
listbox3.Items.Add(inttostr(n) + '=' + inttostr(k));
inc(k, 1);
if l1.Strings[n] = l2.Strings[k] then
listbox3.Items.Add(inttostr(n) + '=' + inttostr(k));
inc(k, 1);
if l1.Strings[n] = l2.Strings[k] then
listbox3.Items.Add(inttostr(n) + '=' + inttostr(k));
inc(k, 1);
if l1.Strings[n] = l2.Strings[k] then
listbox3.Items.Add(inttostr(n) + '=' + inttostr(k));
inc(k, 1);
if l1.Strings[n] = l2.Strings[k] then
listbox3.Items.Add(inttostr(n) + '=' + inttostr(k));
inc(k, 1);
if l1.Strings[n] = l2.Strings[k] then
listbox3.Items.Add(inttostr(n) + '=' + inttostr(k));
inc(k, 1);
if l1.Strings[n] = l2.Strings[k] then
listbox3.Items.Add(inttostr(n) + '=' + inttostr(k));
inc(k, 1);
if l1.Strings[n] = l2.Strings[k] then
listbox3.Items.Add(inttostr(n) + '=' + inttostr(k));
inc(k, 1);
if l1.Strings[n] = l2.Strings[k] then
listbox3.Items.Add(inttostr(n) + '=' + inttostr(k));
inc(k, 1);
if l1.Strings[n] = l2.Strings[k] then
listbox3.Items.Add(inttostr(n) + '=' + inttostr(k));
end; end;
Gauge1.Progress := Gauge1.MaxValue;
listbox3.Items.EndUpdate;
fTime := GetTickCount - fTime;
Label4.Caption := '时间:' + (formatFloat('#0', fTime)); //计算时间,是DWORD
end;
//一次循环内多次比较的优化方法
要求在5~25分钟内把listbox1中的每条数据与listbox2中的每条数据对比,看看对应位置的字符是否相同,有几个相同!,算法我已经作出来了,不过速度慢,请大家帮忙看看。
我循环里的数据量可能是300000×15×1000=4500000000(单个字符进行比较)
////////原来是要进行字符比较啊,我,我真粗心
http://community.csdn.net/Expert/topic/3095/3095447.xml?temp=.3237879
仔细看了看,如果改进需求并切用一定的内存空间换取时间pankun兄的算法已经解决的非常好了,不知能不能满足眼镜兄的需要,的确显示部分赞用了大部分的时间上面还有个兄弟说多线程来提高速度,我不是很清楚win32的线程调度模型,所以不能确定这到底能不能缩短时间(正如pankun所说),但如果是现代分时系统中常用的多级反馈队列调度和时间片轮转的算法的话,增加线程和适当提高线程的优先级的确可以获得更多的CPU时间片(因为默认的程序只有一个主线程)从而提高一定的速度,但线程的创建和将数据分段又将是一个新的代价继续关注一下
var
Data1Len, Data2Len, n, i, j, k: integer;
TimeCount: Cardinal;
begin
TimeCount := GetTickCount;
Data1Len := Length(Data1);
Data2Len := Length(Data2);
for i := 0 to Data1Len - 1 do
for j := 0 to Data2Len - 1 do
begin
n := 0;
if Data1[i][0] = Data2[j][0] then
inc(n);
if Data1[i][1] = Data2[j][1] then
inc(n);
if Data1[i][2] = Data2[j][2] then
inc(n);
if Data1[i][3] = Data2[j][3] then
inc(n);
if Data1[i][4] = Data2[j][4] then
inc(n);
if Data1[i][5] = Data2[j][5] then
inc(n);
if Data1[i][6] = Data2[j][6] then
inc(n);
if Data1[i][7] = Data2[j][7] then
inc(n);
if Data1[i][8] = Data2[j][8] then
inc(n);
if Data1[i][9] = Data2[j][9] then
inc(n);
if Data1[i][10] = Data2[j][10] then
inc(n);
if Data1[i][11] = Data2[j][11] then
inc(n);
if Data1[i][12] = Data2[j][12] then
inc(n);
if Data1[i][13] = Data2[j][13] then
inc(n);
//这里N就是相同的数据
end;
ShowMessage(IntToStr(GetTickCount - TimeCount));
end;15秒左右,请在Project->Options->Compiler中关闭Optimization
选项,否则上述代码会因为n变量没有实际用到,而把所有inc(n)语句优化掉上面的优化办法只是展开了内层循环,算法实际没变.后来我试了一下利用MMX指今优化
procedure TForm1.Button2Click(Sender: TObject);
var
Data1Len, Data2Len, n, i, j: integer;
TimeCount: Cardinal;
Count1, Count2: array [0..7] of byte;
P1, p2: Pointer;
mask: int64;
begin
TimeCount := GetTickCount;
Data1Len := Length(Data1);
Data2Len := Length(Data2);
mask := $0101010101010101;
for i := 0 to Data1Len - 1 do
for j := 0 to Data2Len - 1 do
begin
p1 := @Data1[i][0];
p2 := @Data2[j][0];
n := 0;
asm
mov esi, p1
mov edi, p2
//移动64位(8个字节)到MMX寄存器
movq mm0, [esi]
movq mm1, [edi]
pcmpeqb mm0, mm1
pand mm0, mask
//结果放回数组,为1的字节表示相等
movq int64 ptr count1, mm0
movq mm0, [esi+8]
movq mm1, [edi+8]
pcmpeqb mm0, mm1
pand mm0, mask
//结果放回数组,为1的字节表示相等,只有0~4位有效
movq int64 ptr count2, mm0 //在这里求出Count1, Count2数组中为1的字节数量就是相等的字符个数
//求数组和就可以实现求相同字符个数,但效率不高 emms
end;
end;
ShowMessage(IntToStr(GetTickCount - TimeCount));
end;需要4秒左右,不过上面的代码还没完成,还没有找到快速求出数组中字节等于01的办法.
用了哈希杂凑法,无需依次比较各位字符。假定ListBox1存放了n条记录,待匹配的记录有m条,存放在ListBox2中,所有记录的长度均为l。以访问一个字符为一次基本操作,则该算法的时间复杂度在最坏情形下是O(n*l+m*l+n*l),最好情形是O(n*l+m*l),空间复杂度是O(2*n*l+m*l+Δ),其中Δ=l*字符的不同种数,其中一个n*l和Δ是建哈希表所需的辅助空间。
时间复杂度比前面的算法要好得多,空间复杂度也完全可以接受。
下面的程序按楼主的要求随机产生的30000条和1000条长度均为15的记录,在我1G的机器上运行,如果输出所有结果运行时间为28.62秒,不输出结果的话是0.72秒。
从这里可以看出,改进算法比用线程、嵌入汇编、优化编译器之类的方法要有效得多。
=====================================================================
unit match;interfaceuses
Windows, Messages, SysUtils, Classes, Graphics, Controls, Forms, Dialogs,
StdCtrls;
const
TextLength = 30000;
ModeLength = 1000;
RecLength = 15;
MaxChar = 9;
BlockLen = 128;
type
TForm1 = class(TForm)
Button1: TButton;
Button2: TButton;
ListBox1: TListBox;
ListBox2: TListBox;
procedure FormCreate(Sender: TObject);
procedure Button1Click(Sender: TObject);
procedure Button2Click(Sender: TObject);
private
{ Private declarations }
public
{ Public declarations }
end;
TLink = ^BlockNode;
BlockNode=Record
cur : array [0..BlockLen-1] of integer;
realen : integer;
next : TLink;
end;
THashTable = array [0..RecLength-1,0..MaxChar] of TLink;
TCount = array [0..TextLength-1] of integer;
TRec = array [0..RecLength-1] of byte;
DanyTab = array of TRec;
var
Form1: TForm1;
HashTable : THashTable;
TextTable,ModeTable : DanyTab; procedure init(Tab:DanyTab;length:integer);
procedure ShowData(Tab:DanyTab;length:integer;ListBox:TListBox);
procedure InsertSpace(VAR Head:TLink; row : integer);
procedure stat(Head:TLink; VAR count: TCount);
procedure output(number:integer;VAR count:TCount; VAR OutFile:TextFile);
procedure destory(VAR HashTable:THashTable);
implementation{$R *.DFM}procedure TForm1.FormCreate(Sender: TObject);
begin
randomize;
end;procedure TForm1.Button1Click(Sender: TObject);
begin
setlength(TextTable,TextLength);
setLength(ModeTable,ModeLength);
init(TextTable,TextLength);
init(ModeTable,ModeLength);
ShowData(TextTable,TextLength,ListBox1);
ShowData(ModeTable,ModeLength,ListBox2);
end;procedure init(Tab:DanyTab;length:integer);
var
i,j:integer;
begin
for i:=0 to length-1 do
for j:=0 to RecLength-1 do
Tab[i][j]:=integer(Random(MaxChar+1));
end;
//把随机产生的数据显示给用户看
procedure ShowData(Tab:DanyTab;length:integer;ListBox:TListBox);
var
str : array [0..RecLength] of char ;
i,j : integer;
begin
ListBox.clear;
str[RecLength]:=#0;
for i:=0 to length-1 do begin
for j:=0 to RecLength-1 do
str[j]:=chr(Tab[i][j]+ord('0'));
ListBox.Items.Append(string(str));
end;
end;procedure InsertSpace(VAR Head:TLink; row : integer);
var
p:TLink;
begin
if Head^.realen>=BlockLen then begin
new(p);
p^.realen:=0;
p^.next:=Head;
Head:=p;
end;
with Head^ do begin
cur[realen]:=row;
realen:=realen+1;
end;
end;procedure stat(Head:TLink; VAR count: TCount);
var
p:TLink;
i,index:integer;
begin
p:=Head;
while p<>nil do begin
for i:=0 to p^.realen-1 do begin
index:=p^.cur[i];
count[index]:=count[index]+1;
end;
p:=p^.next;
end;
end;
//这个是真正的匹配
procedure TForm1.Button2Click(Sender: TObject);
var
i,j :integer;
k:byte;
count : TCount;
OutFile: TextFile;
starttime,endtime: TDateTime;
begin
starttime:=now;
//初始化哈希表
for i:=0 to RecLength-1 do
for j:=0 to Maxchar do begin
new(HashTable[i][j]);
HashTable[i][j]^.realen:=0;
HashTable[i][j]^.next:=nil;
end;
//遍历TextTable中的每个字符,在哈希表中填入内容
for i:=0 to TextLength-1 do
for j:=0 to RecLength-1 do begin
k:=TextTable[i,j];
InsertSpace(HashTable[j,k],i);
end;
//开始匹配,遍历ModeTable表,查找哈希表
assignFile(OutFile,'MatchRsult.txt');
Rewrite(OutFile);
for i:=0 to ModeLength-1 do begin
for j:=0 to TextLength-1 do count[j]:=0;
for j:=0 to RecLength-1 do begin
k:=ModeTable[i,j];
stat(HashTable[j,k],count);
end;
output(i,count,OutFile);
end;
closeFile(OutFile);
destory(HashTable);
endtime:=now;
showmessage(format('统计完毕!,耗时%f秒',[(endtime-starttime)*24*3600]));
end;procedure output(number:integer;VAR count:TCount; VAR OutFile:Text);
var
i:integer;
begin
write(OutFile,format('第%d条记录',[number]));
for i:=0 to TextLength-1 do begin
if (i mod 30=0) then writeln(OutFile);
write(OutFile,count[i]);
write(OutFile,' ');
end;
writeln(OutFile);
end;procedure destory(VAR HashTable:THashTable);
var
i,j:integer;
p:TLink;
begin
for i:=0 to RecLength-1 do
for j:=0 to MaxChar do begin
p:=HashTable[i][j];
while p<>nil do begin
HashTable[i][j]:=HashTable[i][j]^.next;
dispose(p);
p:=HashTable[i][j];
end;
end;
end;end.
个人觉得还是用MMX的方法计算比较好,只需要4秒左右.加上计算数组中为0数据个数,一共需要10来秒.
但是,先把300000中的同位置重复的字符找出来,再把1000中同位置重复的字符找出来。再在两者之间进行配对比较!这个样子可能好一点!!
for i:=0 to 300000 do
begin
i:=i+15;
.......
end;
你可以试试
两个算法的复杂度都不在同一等级,怎么可能MMX更快?
并把你的count : TCount;放在了var区中测试的(放在堆栈中会溢出).
MMX一条指今就计算了8个字符,所以也比较快.
呵呵,我没有在ListBox中放,把你的代码中的ShowData屏蔽了,只是在内存中生成的数据.
并把你的count : TCount;放在了var区中测试的(放在堆栈中会溢出).
MMX一条指今就计算了8个字符,所以也比较快.
=======================================
明白了,刚刚分析了一下,关键是哈希表开得太小,我的哈希表是一个二维数组,只有“15*字符种数”个元素,而这里的字符种数我定的是10(不知道具体数据),对于30000条记录而言,聚集不是大问题,但对于300000条记录,聚集就必须重点考虑了。
我刚刚把字符种数设定为128(即程序中的:MaxChar= 127;,当然这是理想状况),将count定义成动态数组,用setlength分配内存,其它没动,运行时间也是6.48秒,在你的机器上应该还快点,应该接近你的MMX。
要进一步提高速度,还得再嵌入汇编,应该还是有这个余地的。
你生成的文本文件怎么有59.1M!