lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dave Peixotto" <peixo...@geofolio.com>
Subject Re: index other document types
Date Fri, 26 Jul 2002 15:34:54 GMT
Lucene is very good at indexing and searching text documents.  If you need
to index other types of documents (Word docs, PDFs, etc.) then a good
strategy is to convert those documents to text and use Lucene to index the
text version of the document.  If you already have a tool to convert other
document types to text, then you should have no trouble indexing those
documents.

----- Original Message -----
From: "Jun Zhou" <ACP01JZ@sheffield.ac.uk>
To: "Lucene Users List" <lucene-user@jakarta.apache.org>
Sent: Friday, July 26, 2002 7:52 AM
Subject: index other document types


> Dear all,
>
>  I learned from Lucene FAQ that if we want to index other document types,
we need to provide a parser or extractor for every document type. I know
there are some tools available which can convert other document types to txt
format. Is the converter a parser or extractor at all?
>
>  Thank you for your kind assistance in advance.
>
>  Best regards
> Jun Zhou
> acp01jz@sheffield.ac.uk
>


--
To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>


Mime
View raw message