菜鸟问题，怎样读取网叶的文字？

例如网址http://community.csdn.net/Expert/FAQ/FAQ_Index.asp?id=151373
里面的文字全部读取出来？

解决方案 »

免费领取超大流量手机卡，每月29元包185G流量+100分钟通话, 中国电信官方发货

IHTMLDocument * pDoc = ....;
IHTMLElement * pElement = NULL;
pDoc->get_Body(&pELement);
BSTR bsText;
pElement->get_InnerText(&bsText);
这个问题决不菜鸟，很考人得。楼上的代码就是正确的。但如何获得pDoc，不是那么简单的。
截取前人的东东～～BOOL GetSourceHtml(CString theUrl,CString Filename)
{
CInternetSession session;
CInternetFile* file = NULL;
try
{
    // 试着连接到指定URL
    file = (CInternetFile*) session.OpenURL(theUrl);
}
catch (CInternetException* m_pException)
{
    // 如果有错误的话，置文件为空
    file = NULL;
    m_pException->Delete();
    return FALSE;
} // 用dataStore来保存读取的网页文件
CStdioFile dataStore;
if (file)
{
    CString  somecode; //也可采用LPTSTR类型，将不会删除文本中的\n回车符
    BOOL bIsOk = dataStore.Open(strPath+"\\"+Filename,
CFile::modeCreate
| CFile::modeWrite
| CFile::shareDenyWrite
| CFile::typeText);

    if (!bIsOk)
return FALSE;

    // 读写网页文件，直到为空
    while (file->ReadString(somecode) != NULL) //如果采用LPTSTR类型，读取最大个数nMax置0，使它遇空字符时结束
    {
dataStore.WriteString(somecode);
dataStore.WriteString("\n");    //如果somecode采用LPTSTR类型,可不用此句
    }

file->Close();
delete file;
}
else
{
    dataStore.WriteString(_T("到指定服务器的连接建立失败..."));
    return FALSE;
}
return TRUE;
}
pDoc＝ (IHTMLDocument2 *)pBrowser->GetDocument();
用CHtmlView载入目标网址，即可得到IHTMLDocument2
roger_ding的意见比较好，同时你还可以浏览网页内容
先用CinternetSession:: OpenURL()取得网页的源代代码，然后用正则表达式来去掉超文本的标签和脚本，这样余下的便函是网页页面内容了。