如何解析网页形式的excel文件

现有一excel文件a.xsl,它不是真正的excel文件，用UE或editplus等工具打开看不是二进制，而是html形式的。用jxl或poi无法识别。手工把它另存成真正的excel文件才可以解析。
我现在需要a.xsl文件里面的数据，这个a.xsl文件是自动生成的，我要写个方法解析它，不用手工去另存它。
请问有什么方法解决？

解决方案 »

免费领取超大流量手机卡，每月29元包185G流量+100分钟通话, 中国电信官方发货

不用写什么方法, 将文件后缀改为html,然后用IE打开它即可.
.xsl采用java的DOM4J包解析应该行！！就当XML解析一下看看吧！是这种样式吧！！
<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"><xsl:template match="/">
  <html>
  <body>
    <h2>My CD Collection</h2>
    <table border="1">
      <tr bgcolor="#9acd32">
        <th>Title</th>
        <th>Artist</th>
      </tr>
      <xsl:for-each select="catalog/cd">
      <xsl:sort select="artist"/>
      <tr>
        <td><xsl:value-of select="title"/></td>
        <td><xsl:value-of select="artist"/></td>
      </tr>
      </xsl:for-each>
    </table>
  </body>
  </html>
</xsl:template></xsl:stylesheet>
html格式的文件，可以用记事本打开的，也就是说可以作为文本文件打开，直接把这个文件流读到后台，经过分析之后写到一个xls文件中就可以了。
头部是这种样子：
MIME-Version: 1.0
X-Document-Type: Workbook
Content-Type: multipart/related; boundary="----=_NextPart_01C3AACE.2206ED10"This document is a Web archive file.  If you are seeing this message, this means your browser or editor doesn't support Web archive files.  For more information on the Web archive format, go to http://officeupdate.microsoft.com/office/webarchive.htm.------=_NextPart_01C3AACE.2206ED10
Content-Location: http://localhost/excel.htm
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html; charset="utf-8"
参考2楼和4楼的吧  要提取数据  用java的DOM4J包解析是一种方法
X-Document-Type: Workbook
Content-Type: multipart/related; boundary="----=_NextPart_01C3AACE.2206ED10" This document is a Web archive file.  If you are seeing this message, this means your browser or editor doesn't support Web archive files.  For more information on the Web archive format, go to http://officeupdate.microsoft.com/office/webarchive.htm. ------=_NextPart_01C3AACE.2206ED10
Content-Location: http://localhost/excel.htm
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html; charset="utf-8" 格式上没有什么规律就采用普通的文件流读取！！: 作为分隔符号！
DOM4J会解析到吐血。模板生成的东西，
MIME-Version: 1.0
X-Document-Type: Workbook
Content-Type: multipart/related; boundary="----=_NextPart_01C3AACE.2206ED10"This document is a Web archive file.  If you are seeing this message, this means your browser or editor doesn't support Web archive files.  For more information on the Web archive format, go to http://officeupdate.microsoft.com/office/webarchive.htm.------=_NextPart_01C3AACE.2206ED10
Content-Location: http://localhost/excel.htm
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html; charset="utf-8"<html xmlns:o=3D"urn:schemas-microsoft-com:office:office"
xmlns:x=3D"urn:schemas-microsoft-com:office:excel"
xmlns:v=3D"urn:schemas-microsoft-com:vml"
xmlns=3D"http://www.w3.org/TR/REC-html40">
<head>
<meta name=3D"Excel Workbook Frameset">
<meta http-equiv=3DContent-Type content=3D"text/html; charset=3Dutf-8">
<meta name=3DProgId content=3DExcel.Sheet>
<meta name=3DGenerator content=3D"Microsoft Excel 10">
<link rel=3DFile-List href=3D"
excel_files/filelist.xml">
<link rel=3DEdit-Time-Data href=3D"excel_files/editdata.mso">
<link rel=3DOLE-Object-Data href=3D"excel_files/oledata.mso">
<![if !supportTabStrip]>
<link id=3D"shLink" href=3D"excel_files/page1-1.htm">
<link id=3D"shLink">
<script language=3D"JavaScript">
</script>
<![endif]>
</head>
<frameset rows=3D"*,39" border=3D0 width=3D0 frameborder=3Dno framespacing=3D0>
<frame src=3D"excel_files/page1-1.htm" name=3D"frSheet">
<frame src=3D"excel_files/tabstrip.htm" name=3D"frTabs" marginwidth=3D0 marginheight=3D0>
<noframes>
<body>
<p>This page uses frames, but your browser doesn't support them.</p>
</body>
</noframes>
</frameset>
</html>
------=_NextPart_01C3AACE.2206ED10
Content-Location: http://localhost/excel_files/stylesheet.css
Content-Transfer-Encoding: quoted-printable
Content-Type: text/css; charset="utf-8"
tr
{mso-height-source:auto;}
col
...
...
...
<span class=3Dfont7><span style='mso-spacerun:yes'> </span>-<span style='mso-spacerun:yes'> </span></span>
</td>
<td class=3Dxl41>
<span class=3Dfont7>0.45607638</span>
</td>
</tr>
</table>
</body>
</html>
------=_NextPart_01C3AACE.2206ED10--
有什么方法可以实现把excel文件另存成真正Excel格式的文件？
1,什么是不是真正的excel文件? 其实也就是一个文本文件吧!
2,如果是自动生成的,那么,你何必不在生成它时就将数据取出来插入到数据库中呢?
3,你的问题不是解析excel文件,而是在解析一个html文件,将其中表格中的数据提取出来,保存到数据库中,对不?