Retrieving the HTML of the current selection If you want to limit the HTML to just what a user has selected, instead of the entire document, we can use the IHTMLXxx COM interfaces. The first thing you need to do is get access to the IHTMLDocument interface for the current document. IWebBrowser2 gives you access using it's Document property. The Document property returns an IDispatch interface, so we need to QueryInterface the IDispatch interface for an IHTMLDocument interface, like so (raw C++): IDispatch* pDocDisp = 0; HRESULT hr = pWebBrowser->get_Document(&pDocDisp);IHTMLDocument2* pDoc = 0; hr = pDocDisp->QueryInterface(IID_IHTMLDocument2, (void**)&pDoc); if (SUCCEEDED(hr)) { //... pDoc->Release(); }pDocDisp->Release();The IHTMLXxx interfaces follow the W3C DOM specification used for JavaScript very closely. If your familiar with those objects, the IHTMLXxx interface will be easy to grasp. In fact, if you know how to do something using JavaScript, you can duplicate it your compiled code using the IHTMLXxx interfaces. That said, you can get the current selection as a IHTMLTxtRange from the document element. Once you have a text range, you can retrieve the plain text or HTML text as shown below: IHTMLDocument2* pDoc = ...;IHTMLSelectionObject* pSelection = 0; HRESULT hr = pDoc->get_selection(&pSelection); if (SUCCEEDED(hr)) { IDispatch* pDispRange = 0; hr = pSelection->createRange(&pDispRange); if (SUCCEEDED(hr)) { IHTMLTxtRange* pTextRange = 0; hr = pDispRange->QueryInterface(IID_IHTMLTxtRange, (void**)&pTextRange); if (SUCCEEDED(hr)) { CComBSTR sText; pTextRange->get_text(&sText); // or pTextRange->get_htmlText(&sText); //... pTextRange->Release(); } pDispRange->Release(); } pSelection->Release(); }pDoc->Release();apply get_text to the <Body> element or <Html> element may fail when the element is missing.you can also use Microsoft Word as a converter. see http://engine.keeboo.com/admin/KeeBookCreator.txt.
Retrieving the HTML of the current selection
If you want to limit the HTML to just what a user has selected, instead of the entire document, we can use the IHTMLXxx COM interfaces. The first thing you need to do is get access to the IHTMLDocument interface for the current document. IWebBrowser2 gives you access using it's Document property. The Document property returns an IDispatch interface, so we need to QueryInterface the IDispatch interface for an IHTMLDocument interface, like so (raw C++):
IDispatch* pDocDisp = 0;
HRESULT hr = pWebBrowser->get_Document(&pDocDisp);IHTMLDocument2* pDoc = 0;
hr = pDocDisp->QueryInterface(IID_IHTMLDocument2, (void**)&pDoc);
if (SUCCEEDED(hr)) { //... pDoc->Release();
}pDocDisp->Release();The IHTMLXxx interfaces follow the W3C DOM specification used for JavaScript very closely. If your familiar with those objects, the IHTMLXxx interface will be easy to grasp. In fact, if you know how to do something using JavaScript, you can duplicate it your compiled code using the IHTMLXxx interfaces. That said, you can get the current selection as a IHTMLTxtRange from the document element. Once you have a text range, you can retrieve the plain text or HTML text as shown below:
IHTMLDocument2* pDoc = ...;IHTMLSelectionObject* pSelection = 0;
HRESULT hr = pDoc->get_selection(&pSelection);
if (SUCCEEDED(hr)) {
IDispatch* pDispRange = 0;
hr = pSelection->createRange(&pDispRange);
if (SUCCEEDED(hr)) {
IHTMLTxtRange* pTextRange = 0;
hr = pDispRange->QueryInterface(IID_IHTMLTxtRange, (void**)&pTextRange);
if (SUCCEEDED(hr)) {
CComBSTR sText;
pTextRange->get_text(&sText);
// or
pTextRange->get_htmlText(&sText);
//...
pTextRange->Release();
}
pDispRange->Release();
}
pSelection->Release();
}pDoc->Release();apply get_text to the <Body> element or <Html> element may fail when the element is missing.you can also use Microsoft Word as a converter. see http://engine.keeboo.com/admin/KeeBookCreator.txt.
IDispatch* pDocDisp = 0;
HRESULT hr = pWebBrowser->get_Document(&pDocDisp);IHTMLDocument2* pDoc = 0;
hr = pDocDisp->QueryInterface(IID_IHTMLDocument2, (void**)&pDoc);
if (SUCCEEDED(hr)) {
IHTMLElement* pBody;
hr = pDoc->get_body(&pBody);
if SUCCEEDED(hr))
{
BSTR bstrHTMLText;
hr = pBody->get_outerText(&bstrHTMLText);
//这个就是网页文本
CString strText = bstrHTMLText;
......
SysFreeString( bstrHTMLText);
pBody->Release();
}
}
pDoc->Release();
}pDocDisp->Release();
代码段如下:IHTMLDocument2 *pHTMLDocument=NULL;
IHTMLElement* pBody;
if (!(pHTMLDocument = (IHTMLDocument2*)m_ie2.GetDocument()))
break;
hr = pHTMLDocument->get_body(&pBody);
if(SUCCEEDED(hr))
{
BSTR bstrHTMLText;
hr = pBody->get_outerText(&bstrHTMLText);
CString strText = bstrHTMLText;
SysFreeString( bstrHTMLText);
pBody->Release();
}
IHTMLElementCollection* pCollection;
pHTMLDocument->get_all(&pCollection);
long len;
pCollection->get_length(&len);
for (long l=0; l<len; l++)
{
VARIANT varIndex, var2;
VariantInit(&varIndex);
VariantInit(&var2);
varIndex.vt = VT_I4;
varIndex.lVal = l;
IDispatch pDisp;
pCollection->item( varIndex, var2, &pDisp );
IHTMLElement* pElem;
pDisp->QueryInterface( IID_IHTMLElement, (LPVOID*) &pElem );
BSTR bstrHTMLText;
pElem->get_outerText((&bstrHTMLText);
CString strText = bstrHTMLText;
SysFreeString( bstrHTMLText);
pElem->Release();
}
pCollection->Release();
IHTMLDocument2* pDoc2;
CComBSTR tagName;
pElement->get_tagName(&tagName);
CString str = tagName;
str.MakeUpper();
if (str == "FRAME" || str == "IFRAME")
{
HRESULT hr;
IHTMLWindow2 *pHTMLWindow;
IHTMLFrameBase2* pHTMLFrameBase2;
hr =pElement->QueryInterface(IID_IHTMLFrameBase2, (void**)&pHTMLFrameBase2);
pElement->Release();
hr = pHTMLFrameBase2->get_contentWindow(&pHTMLWindow);
pHTMLFrameBase2->Release();
hr = pHTMLWindow->get_document(&pDoc2);
然后用IHTMLDocument2对域进行操作