判断两段内容的相似度问题？结果返回相似度！

判断两段内容的相似度问题？结果返回相似度！
给你2段内容，怎么精确判断他们得相似度？

解决方案 »

免费领取超大流量手机卡，每月29元包185G流量+100分钟通话, 中国电信官方发货

例如：
内容一：
版权声明：CSDN是本Blog托管服务提供商。如本文牵涉版权问题，CSDN不承担相关责任，请版权拥有者直接与文章作者联系解决。
内容二：
版权声明：CSDN是托管服务提供商。CSDN不承担相关责任，请版权拥有者直接与文章作者联系解决。内容一和内容二很相似，用一种方法来判断两个内容是否相似
刚才看了看，你这个用到了数据结构中-----串匹配的问题
http://student.zjzk.cn/course_ware/data_structure/web/chuan/chuan4.2.3.1.htm
里面只有很简单的介绍，还要忙别的事，只能帮你这么多了。
两段内容???什么内容啊？要是两个memo里的文字，恐怕得挨个比较文字吧，写个线程处理，我可能还不明白你的意思
这个问题大了！ICMGDCHN((梦醒泪湿襟)－＞喜欢明月) 提出的：
1、相似度要怎么量化?
2、这个涉及到AI的呀正好体现的这个问题大在什么地方。如果要求非常高的话，所谓的串匹配、数据结构、线程处理，相对于问题本身的复杂度来说，都象小儿科一样了。最高的要求，乃是做出一个人的脑袋来：
1、我是男孩阿Q的老爸
2、阿Q是我的儿子，我是男的
3、阿Q是我和他妈生的，阿Q是一个男孩
这三句话毫无疑问是有着非常大的相似度的，但用电脑来识别，头大呀！降低要求（可能就是楼主的要求）：
根据文字本身的相似度来判别，很快想到的是用第二句的部分内容去和第一句的内容匹配，也就是串匹配（如果从一般做程序的知识结构来说，就应该做的是精确匹配，而非涉及自然语言语义的智能模糊匹配）。匹配的问题数据结构的知识可以解决了，那么，剩下的问题就是：你用什么样的策略从第二句当中选取部分内容，按字？按词？按词组？按句子？还有拿去匹配了，占多少的比例也就是阀值，就可以判断是相似？语言学，模糊理论，唉，头大呀。
就是有多少内容是相同的这个很容易啊《葵》中的一段比较字串的代码，希望帮到你
'John' and 'John' = 100%
'John' and 'Jon' = 75%
'Jim' and 'James' = 40%
"Luke Skywalker" and 'Darth Vader' = 0% function StrSimilar (s1, s2: string): Integer;
var hit: Integer; // Number of identical chars
    p1, p2: Integer; // Position count
    l1, l2: Integer; // Length of strings
    pt: Integer; // for counter
    diff: Integer; // unsharp factor
    hstr: string; // help var for swapping strings
    // Array shows is position is already tested
    test: array [1..255] of Boolean;
begin
// Test Length and swap, if s1 is smaller
// we alway search along the longer string
if Length(s1) < Length(s2) then begin
  hstr:= s2;  s2:= s1;  s1:= hstr;
end;
// store length of strings to speed up the function
l1:= Length (s1);
l2:= Length (s2);
p1:= 1;  p2:= 1;  hit:= 0;
// calc the unsharp factor depending on the length
// of the strings.  Its about a third of the length
diff:= Max (l1, l2) div 3 + ABS (l1 - l2);
// init the test array
for pt:= 1 to l1 do test[pt]:= False;
// loop through the string
repeat
  // position tested?
  if not test[p1] then begin
   // found a matching character?
   if (s1[p1] = s2[p2]) and (ABS(p1-p2) <= diff) then begin
    test[p1]:= True;
    Inc (hit); // increment the hit count
    // next positions
    Inc (p1); Inc (p2);
    if p1 > l1 then p1:= 1;
   end else begin
    // Set test array
    test[p1]:= False;
    Inc (p1);
    // Loop back to next test position if end of the string
    if p1 > l1 then begin
     while (p1 > 1) and not (test[p1]) do Dec (p1);
     Inc (p2)
    end;
   end;
  end else begin
   Inc (p1);
   // Loop back to next test position if end of string
   if p1 > l1 then begin
    repeat Dec (p1); until (p1 = 1) or test[p1];
    Inc (p2);
   end;
  end;
until p2 > Length(s2);
// calc procentual value
Result:= 100 * hit DIV l1;
end;