lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: Lucene is not able to index certain words of txt file converted form pdf
Date Wed, 18 Jun 2008 12:57:03 GMT
Hi,

Use java-user list, there are more people on it.

You need to change the setting in IndexWriter that tells Lucene how many tokens froma a document
to index.  By default it indexes only 10,000.  I can't remember the parameter name, but look
at the IndexWriter javadocs, it's right there.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


----- Original Message ----
> From: m657m <gaurav.gash@gmail.com>
> To: general@lucene.apache.org
> Sent: Wednesday, June 18, 2008 8:24:53 AM
> Subject: Lucene is not able to index certain words of txt file converted form pdf
> 
> 
> Hi
> 
> I am using Lucene for indexing and searching the documents.
> I have an PDF (Lucene_in_action.pdf) file which i converted to txt file
> using PDFBox.
> The same txt file i indexed but while searching its not able to saerch
> certain words. But Lucene has given me the results if i search for other
> words.
> I am not able to find any reason for that.
> If any of you intellectuals can help me out in finding the reason.
> 
> Thanks in advance. 
> -- 
> View this message in context: 
> http://www.nabble.com/Lucene-is-not-able-to-index-certain-words-of-txt-file-converted-form-pdf-tp17981585p17981585.html
> Sent from the Lucene - General mailing list archive at Nabble.com.


Mime
View raw message