lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jay <cyberja...@yahoo.com>
Subject RE: many analyzers, same index.
Date Mon, 22 Oct 2001 11:54:17 GMT

>The better approach is
> to implement converters
> that convert these formats to plain text, either a
> String or a Reader.  Then
> you can use the same analyzer for documents in
> different formats.
> 

Has anyone tried implimenting 3rd party open source
utilities to do this?  xpdf (www.foolabs.com/xpdf)
converts pdf to text and catdoc
(http://www.ice.ru/~vitus/catdoc/ver-0.9.html)
converts ms word to text.  Maybe these can be used to
create the plain text for the index...

I look forward to seeing PDF and Word indexing added
to this solution.

My Best;

J

__________________________________________________
Do You Yahoo!?
Make a great connection at Yahoo! Personals.
http://personals.yahoo.com

Mime
View raw message