[oracle text]如何过滤word、pdf文档得到文本部分？

需要对一些WORD文档进行全文检索，数据库版本oracle 9i R2，我是生手请多指教，谢谢。表:
create table textdemo(id number(3) primary key,content blob);
索引：
create index t_textdemo_idn on textdemo(content) indextype is ctxsys.content; 一、已经把word文档转化为二进制存入blob列（我是通过分段截取word文档转化为二进制存入blob，效率低），但是尝试过对blob型数据进行检索：
select id from textdemo where contains(content,'学员')>0;
检索结果为空，失败。二、把普通txt文档转化为二进制存入blob列，再次尝试检索：
select id from textdemo where contains(content,'data')>0;
检索成功。后查到资料，在创建索引前需要对word,excel文档进行过滤得到文本部分。如何实现？

解决方案 »

免费领取超大流量手机卡，每月29元包185G流量+100分钟通话, 中国电信官方发货

自己解决了，囧....--设置索引参数
begin
ctx_ddl.create_preference('mylex','BASIC_LEXER');
ctx_ddl.set_attribute('mylex','printjoins','_-');
ctx_ddl.create_preference('mywordlist', 'BASIC_WORDLIST');
ctx_ddl.set_attribute('mywordlist','PREFIX_INDEX','TRUE');
ctx_ddl.set_attribute('mywordlist','PREFIX_MIN_LENGTH',1);
ctx_ddl.set_attribute('mywordlist','PREFIX_MAX_LENGTH', 5);
ctx_ddl.set_attribute('mywordlist','SUBSTRING_INDEX', 'YES');
end;
/
begin
ctx_ddl.create_preference('cnlex','CHINESE_LEXER');
end;
/--选用参数创建索引
drop index i_docs_idx force;
create index i_docs_idx on docs(text)
indextype is ctxsys.context
parameters ('DATASTORE CTXSYS.DIRECT_DATASTORE FILTER CTXSYS.INSO_FILTER LEXER fmduser.CNLEX WORDLIST fmduser.MYWORDLIST');--这样就可以查询得到结果了。
呵呵，jf
http://epub.itpub.net/4/1.htm