如何从字符串中提取出所有的url连接？

例如：
CString sjk="http://www.csdn.net 程序员网站 http://csdn.net sdfd www.csdn.net"提取出http://www.csdn.net
http://csdn.net
www.csdn.net

解决方案 »

免费领取超大流量手机卡，每月29元包185G流量+100分钟通话, 中国电信官方发货

查到一些线索，好象要用到 IHTMLDocument2的Get_links那么如何将CString转换成IHTMLDocument2呢？我总不能把个字符串写到html文件，用cppwebbrowser访问这个html文件，然后再获得IHTMLDocument2吧？兄弟们，帮帮忙呀！
还有就是把string以超链接特显的方式显示在RichEdit上，然后根据链接颜色的不同来提取链接。可总觉得不用这么复杂的，应该很简单的就可以完成了呀。我是不是走火入魔了，尽想些旁门左道呀！
当然，根据什么头<A HREF,尾>来截取因为例外情况教多是不用考虑的了。
IsValidURL Function--------------------------------------------------------------------------------Determines if a specified string is a valid URL.SyntaxHRESULT IsValidURL(
    LPBC pBC,
    LPCWSTR szURL,
    DWORD dwReserved
);ParameterspBC
[in] Address of the IBindCtx interface. This parameter is optional and is currently ignored. It should be set to NULL.
szURL
[in] Address of a string value that contains the full URL to be checked.
dwReserved
[in] Reserved for future use. This must be set to zero.
Return ValueReturns one of the following values:S_OK The szURL parameter contains a valid URL.
S_FALSE The szURL parameter does not contain a valid URL.
E_INVALIDARG One of the parameters is invalid. Function Information
楼上斑竹，strstr可以吗？是不是太敷衍了事了？
敢情我在csdn和google搜索半天，在c++ builder和vc版发帖，就一个strstr呀！
you could load the string into a webbrowser control and then get all the links through document.links collection, but the easier way is to use a regular expressions library
<A href="http://www.dameiprinting.com/dz/index.htm">
<a
      href="http://www.dameiprinting.com/dz/feedback.htm">
<A
      href="http://www.dameiprinting.com/dz/toc.htm">
<A href="http://www.dameiprinting.com/dz/search.htm"><A language=JavaScript
      onmouseover="if(MSFPhover) document['MSFPnav1'].src=MSFPnav1h.src"
      onmouseout="if(MSFPhover) document['MSFPnav1'].src=MSFPnav1n.src"
      href="http://www.dameiprinting.com/dz/products.htm">
<A
      target="_blank" href="http://www.dameiprinting.com/"><A href="mailto:[email protected]"><!--webbot<A
        href="mailto:[email protected]">
<A
        href="mailto:[email protected]">
<A
      href="http://www.dameiprinting.com/dz/products.htm"
      target="">斑竹，请用strstr把上述的url连接提取出来,不要email!
看看这个怎么样？分析html文件的，最终就是得到html文件的内容
看看那个什么OnBGo函数。void CTestDlg::OnBbrowse() //打开html文件
{
UpdateData();
CFileDialog fdlg(TRUE, NULL, NULL, OFN_HIDEREADONLY|OFN_FILEMUSTEXIST,
_T("HTML Files (*.html; *.htm)|*.html;*.htm|All Files (*.*)|*.*||"), this);
if (fdlg.DoModal() == IDOK) {
m_csFilename = fdlg.GetPathName();
UpdateData(FALSE);
}

}void CTestDlg::OnBgo()
{
UpdateData();
CWaitCursor wait;
if(m_csFilename.IsEmpty()){
AfxMessageBox(_T("Please specify the file to parse"));
return;
}
CFile f; //let's open file and read it into CString (u can use any buffer to read though
if (f.Open(m_csFilename, CFile::modeRead|CFile::shareDenyNone)) {
m_wndLinksList.ResetContent();
CString csWholeFile;
f.Read(csWholeFile.GetBuffer(f.GetLength()), f.GetLength());
csWholeFile.ReleaseBuffer(f.GetLength());
f.Close(); //declare our MSHTML variables and create a document
MSHTML::IHTMLDocument2Ptr pDoc;
MSHTML::IHTMLDocument3Ptr pDoc3;
MSHTML::IHTMLElementCollectionPtr pCollection;
MSHTML::IHTMLElementPtr pElement; HRESULT hr = CoCreateInstance(CLSID_HTMLDocument, NULL, CLSCTX_INPROC_SERVER,
IID_IHTMLDocument2, (void**)&pDoc);

//put the code into SAFEARRAY and write it into document
SAFEARRAY* psa = SafeArrayCreateVector(VT_VARIANT, 0, 1);
VARIANT *param;
bstr_t bsData = (LPCTSTR)csWholeFile;
hr = SafeArrayAccessData(psa, (LPVOID*)&param);
param->vt = VT_BSTR;
param->bstrVal = (BSTR)bsData;

hr = pDoc->write(psa);
hr = pDoc->close();

SafeArrayDestroy(psa); //I'll use IHTMLDocument3 to retrieve tags. Note it is available only in IE5+
//If you don't want to use it, u can just run through all tags in HTML
//(IHTMLDocument2->all property)
pDoc3 = pDoc;

//display HREF parameter of every link (A tag) in ListBox
pCollection = pDoc3->getElementsByTagName(L"A");
for(long i=0; i<pCollection->length; i++){
pElement = pCollection->item(i, (long)0);
if(pElement != NULL){
//second parameter says that you want to get text inside attribute as is
m_wndLinksList.AddString((LPCTSTR)bstr_t(pElement->getAttribute("href", 2)));
}
}
}
}
hehe！JennyVenus(一袋烟后老汉绕村后的老槐树三圈有感) ,问题解决不了，我急呀！saucer(思归, MS .NET MVP) ：use a regular expressions library？这个库那里有呀？是自己写吗？要是自己写那就算了，就用第一种方法了。
Regex++
http://www.boost.org/libs/regex/
http://www.china-askpro.com/msg47/qa80.shtml
这个地方说的很清楚，我的代码就是从那边找过去的，我已经运行测试了结果，很正确，和你需要不同的是他从硬盘上读网页内容到内存，而你的内容已经存在内存中了，codeproject需要注册邮箱地址，你注册一下就能下载了，如果嫌麻烦，赶快把邮箱贴出来，我给你发过去。
哈哈，JennyVenus，就是我要找的东东。谢谢！
结帖！