lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From saisantoshi <saisantosh...@gmail.com>
Subject Readers for extracting textual info from pd/doc/excel for indexing the actual content
Date Fri, 25 Jan 2013 22:54:29 GMT
I want to index the document content( such as PDF/word/excel) and am just
wondering if there are any good readers that I can use to integrate into
Lucene 4.0. Any pointers/example code is appreciated..

Lucene In Action book mentions "tika" as the library to use but not sure if
this is the preferred approach. Anyone who used this, if they could share
some experiences in using this library is greatly appreciated.

AN example of how a pdf document can be indexed using either the tika
framework or any other reader is much appreciated.

Thanks,
Sai.



--
View this message in context: http://lucene.472066.n3.nabble.com/Readers-for-extracting-textual-info-from-pd-doc-excel-for-indexing-the-actual-content-tp4036379.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message