lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dave Peixotto" <>
Subject Re: index other document types
Date Fri, 26 Jul 2002 15:34:54 GMT
Lucene is very good at indexing and searching text documents.  If you need
to index other types of documents (Word docs, PDFs, etc.) then a good
strategy is to convert those documents to text and use Lucene to index the
text version of the document.  If you already have a tool to convert other
document types to text, then you should have no trouble indexing those

----- Original Message -----
From: "Jun Zhou" <>
To: "Lucene Users List" <>
Sent: Friday, July 26, 2002 7:52 AM
Subject: index other document types

> Dear all,
>  I learned from Lucene FAQ that if we want to index other document types,
we need to provide a parser or extractor for every document type. I know
there are some tools available which can convert other document types to txt
format. Is the converter a parser or extractor at all?
>  Thank you for your kind assistance in advance.
>  Best regards
> Jun Zhou

To unsubscribe, e-mail:   <>
For additional commands, e-mail: <>

View raw message