lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefan Colella <>
Subject Re: search result problem
Date Mon, 21 May 2007 09:25:07 GMT
thx for u reply, i used the explain method and i understand now why some 
documents are returned.

I am using the same Analyzer for indexing and searching.

I tried to only add the content of the page where that expression can be 
found (instead of the whole document) and then  the search works.

Do i have to split my pdf text into more field? Or what could be the 

Grant Ingersoll wrote:
> Try using the explain() method to see why the documents that were 
> returned scored the way they did.
> If I am understanding correctly, you are saying that Luke shows that 
> those words aren't actually in your index?  Can you elaborate on what 
> your analysis process is?  Are you searching using the same Analyzer 
> as you are indexing with?  I would try to isolate the problem down to 
> some unit tests, if possible.
> Cheers,
> Grant
> On May 18, 2007, at 8:12 AM, Stefan Colella wrote:
>> Hello,
>> My application is working with PDF files so i use lucene with PdfBox 
>> to create a little search engine. I am new to lucene.
>> All seemed to work fine but after some tests I saw that some 
>> expressions like "stock option" where never found (or returns the 
>> wrong documents) even if it exist in my PDF files. I searched in the 
>> mail archive and found that I have to use the "French Analyser" but 
>> that didn't work too.
>> I found that there is a tool named Luke to check the lucene index. I 
>> could see that the original text contains those words but nothing in 
>> the tokenizer.
>> Anybody who can help or can explain where I can start to look ?
>> thanks
> --------------------------
> Grant Ingersoll
> Center for Natural Language Processing
> Read the Lucene Java FAQ at 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message