头部是这种样子: MIME-Version: 1.0 X-Document-Type: Workbook Content-Type: multipart/related; boundary="----=_NextPart_01C3AACE.2206ED10"This document is a Web archive file. If you are seeing this message, this means your browser or editor doesn't support Web archive files. For more information on the Web archive format, go to http://officeupdate.microsoft.com/office/webarchive.htm.------=_NextPart_01C3AACE.2206ED10 Content-Location: http://localhost/excel.htm Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset="utf-8"
参考2楼和4楼的吧 要提取数据 用java的DOM4J包解析是一种方法
X-Document-Type: Workbook Content-Type: multipart/related; boundary="----=_NextPart_01C3AACE.2206ED10" This document is a Web archive file. If you are seeing this message, this means your browser or editor doesn't support Web archive files. For more information on the Web archive format, go to http://officeupdate.microsoft.com/office/webarchive.htm. ------=_NextPart_01C3AACE.2206ED10 Content-Location: http://localhost/excel.htm Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset="utf-8" 格式上没有什么规律就采用普通的文件流读取!!: 作为分隔符号!
DOM4J会解析到吐血。模板生成的东西, MIME-Version: 1.0 X-Document-Type: Workbook Content-Type: multipart/related; boundary="----=_NextPart_01C3AACE.2206ED10"This document is a Web archive file. If you are seeing this message, this means your browser or editor doesn't support Web archive files. For more information on the Web archive format, go to http://officeupdate.microsoft.com/office/webarchive.htm.------=_NextPart_01C3AACE.2206ED10 Content-Location: http://localhost/excel.htm Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset="utf-8"<html xmlns:o=3D"urn:schemas-microsoft-com:office:office" xmlns:x=3D"urn:schemas-microsoft-com:office:excel" xmlns:v=3D"urn:schemas-microsoft-com:vml" xmlns=3D"http://www.w3.org/TR/REC-html40"> <head> <meta name=3D"Excel Workbook Frameset"> <meta http-equiv=3DContent-Type content=3D"text/html; charset=3Dutf-8"> <meta name=3DProgId content=3DExcel.Sheet> <meta name=3DGenerator content=3D"Microsoft Excel 10"> <link rel=3DFile-List href=3D" excel_files/filelist.xml"> <link rel=3DEdit-Time-Data href=3D"excel_files/editdata.mso"> <link rel=3DOLE-Object-Data href=3D"excel_files/oledata.mso"> <![if !supportTabStrip]> <link id=3D"shLink" href=3D"excel_files/page1-1.htm"> <link id=3D"shLink"> <script language=3D"JavaScript"> </script> <![endif]><!--[if gte mso 9]><xml> <x:ExcelWorkbook> <x:ExcelWorksheets> <x:ExcelWorksheet> <x:Name>page1-1</x:Name> <x:WorksheetSource HRef=3D"excel_files/page1-1.htm"/> </x:ExcelWorksheet> </x:ExcelWorksheets> <x:Stylesheet HRef=3D"excel_files/stylesheet.css"/> <x:WindowHeight>8835</x:WindowHeight> <x:WindowWidth>14220</x:WindowWidth> <x:WindowTopX>480</x:WindowTopX> <x:WindowTopY>60</x:WindowTopY> <x:ActiveSheet>0</x:ActiveSheet> <x:ProtectStructure>False</x:ProtectStructure> <x:ProtectWindows>False</x:ProtectWindows> </x:ExcelWorkbook> </xml><![endif]--> </head> <frameset rows=3D"*,39" border=3D0 width=3D0 frameborder=3Dno framespacing=3D0> <frame src=3D"excel_files/page1-1.htm" name=3D"frSheet"> <frame src=3D"excel_files/tabstrip.htm" name=3D"frTabs" marginwidth=3D0 marginheight=3D0> <noframes> <body> <p>This page uses frames, but your browser doesn't support them.</p> </body> </noframes> </frameset> </html> ------=_NextPart_01C3AACE.2206ED10 Content-Location: http://localhost/excel_files/stylesheet.css Content-Transfer-Encoding: quoted-printable Content-Type: text/css; charset="utf-8" tr {mso-height-source:auto;} col ... ... ... <span class=3Dfont7><span style='mso-spacerun:yes'> </span>-<span style='mso-spacerun:yes'> </span></span> </td> <td class=3Dxl41> <span class=3Dfont7>0.45607638</span> </td> </tr> </table> </body> </html> ------=_NextPart_01C3AACE.2206ED10--
<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"><xsl:template match="/">
<html>
<body>
<h2>My CD Collection</h2>
<table border="1">
<tr bgcolor="#9acd32">
<th>Title</th>
<th>Artist</th>
</tr>
<xsl:for-each select="catalog/cd">
<xsl:sort select="artist"/>
<tr>
<td><xsl:value-of select="title"/></td>
<td><xsl:value-of select="artist"/></td>
</tr>
</xsl:for-each>
</table>
</body>
</html>
</xsl:template></xsl:stylesheet>
MIME-Version: 1.0
X-Document-Type: Workbook
Content-Type: multipart/related; boundary="----=_NextPart_01C3AACE.2206ED10"This document is a Web archive file. If you are seeing this message, this means your browser or editor doesn't support Web archive files. For more information on the Web archive format, go to http://officeupdate.microsoft.com/office/webarchive.htm.------=_NextPart_01C3AACE.2206ED10
Content-Location: http://localhost/excel.htm
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html; charset="utf-8"
Content-Type: multipart/related; boundary="----=_NextPart_01C3AACE.2206ED10" This document is a Web archive file. If you are seeing this message, this means your browser or editor doesn't support Web archive files. For more information on the Web archive format, go to http://officeupdate.microsoft.com/office/webarchive.htm. ------=_NextPart_01C3AACE.2206ED10
Content-Location: http://localhost/excel.htm
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html; charset="utf-8" 格式上没有什么规律就采用普通的文件流读取!!: 作为分隔符号!
MIME-Version: 1.0
X-Document-Type: Workbook
Content-Type: multipart/related; boundary="----=_NextPart_01C3AACE.2206ED10"This document is a Web archive file. If you are seeing this message, this means your browser or editor doesn't support Web archive files. For more information on the Web archive format, go to http://officeupdate.microsoft.com/office/webarchive.htm.------=_NextPart_01C3AACE.2206ED10
Content-Location: http://localhost/excel.htm
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html; charset="utf-8"<html xmlns:o=3D"urn:schemas-microsoft-com:office:office"
xmlns:x=3D"urn:schemas-microsoft-com:office:excel"
xmlns:v=3D"urn:schemas-microsoft-com:vml"
xmlns=3D"http://www.w3.org/TR/REC-html40">
<head>
<meta name=3D"Excel Workbook Frameset">
<meta http-equiv=3DContent-Type content=3D"text/html; charset=3Dutf-8">
<meta name=3DProgId content=3DExcel.Sheet>
<meta name=3DGenerator content=3D"Microsoft Excel 10">
<link rel=3DFile-List href=3D"
excel_files/filelist.xml">
<link rel=3DEdit-Time-Data href=3D"excel_files/editdata.mso">
<link rel=3DOLE-Object-Data href=3D"excel_files/oledata.mso">
<![if !supportTabStrip]>
<link id=3D"shLink" href=3D"excel_files/page1-1.htm">
<link id=3D"shLink">
<script language=3D"JavaScript">
</script>
<![endif]><!--[if gte mso 9]><xml>
<x:ExcelWorkbook>
<x:ExcelWorksheets>
<x:ExcelWorksheet>
<x:Name>page1-1</x:Name>
<x:WorksheetSource HRef=3D"excel_files/page1-1.htm"/>
</x:ExcelWorksheet>
</x:ExcelWorksheets>
<x:Stylesheet HRef=3D"excel_files/stylesheet.css"/>
<x:WindowHeight>8835</x:WindowHeight>
<x:WindowWidth>14220</x:WindowWidth>
<x:WindowTopX>480</x:WindowTopX>
<x:WindowTopY>60</x:WindowTopY>
<x:ActiveSheet>0</x:ActiveSheet>
<x:ProtectStructure>False</x:ProtectStructure>
<x:ProtectWindows>False</x:ProtectWindows>
</x:ExcelWorkbook>
</xml><![endif]-->
</head>
<frameset rows=3D"*,39" border=3D0 width=3D0 frameborder=3Dno framespacing=3D0>
<frame src=3D"excel_files/page1-1.htm" name=3D"frSheet">
<frame src=3D"excel_files/tabstrip.htm" name=3D"frTabs" marginwidth=3D0 marginheight=3D0>
<noframes>
<body>
<p>This page uses frames, but your browser doesn't support them.</p>
</body>
</noframes>
</frameset>
</html>
------=_NextPart_01C3AACE.2206ED10
Content-Location: http://localhost/excel_files/stylesheet.css
Content-Transfer-Encoding: quoted-printable
Content-Type: text/css; charset="utf-8"
tr
{mso-height-source:auto;}
col
...
...
...
<span class=3Dfont7><span style='mso-spacerun:yes'> </span>-<span style='mso-spacerun:yes'> </span></span>
</td>
<td class=3Dxl41>
<span class=3Dfont7>0.45607638</span>
</td>
</tr>
</table>
</body>
</html>
------=_NextPart_01C3AACE.2206ED10--
2,如果是自动生成的,那么,你何必不在生成它时就将数据取出来插入到数据库中呢?
3,你的问题不是解析excel文件,而是在解析一个html文件,将其中表格中的数据提取出来,保存到数据库中,对不?