lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <oh...@cox.net>
Subject Re: Why does this search succeed with web app, but not Luke?
Date Thu, 01 Jan 1970 00:00:00 GMT
Andrzej,

Hah!  

I tried as you suggested using Luke, and I found at least part of my problem.  Luke was defaulting
to KeywordAnalyzer.  

I changed that to StandardAnalyzer, and did queries for:

path:xxxxxxxxxxxxxxxxxxxxx

and

path:xxxxxxxxxxxxxxxxxxxxxx.dat

For the first, the Rewritten was:

path:xxxxxxxxxxxxxxxxxxxxx

and found 1 document.

For the 2nd, the Rewritten was:

path:"xxxxxxxxxxxxxxxxxxxxxx.dat"

and found 1 document.

So, at least now the Luke search results are the same as what I'm seeing in the luceneweb
web query.

With the 2nd query, I did "Explain structure" and it shows:

Term 0: field='path' text='xxxxxxxxxxxxxxxxxx'
Term 1: field='path' text='dat'

So, going back to Phil Whelan's explanation in his email yesterday:

====================================
This query will also pass through the same (hopefully) Analyzer and 
will be broken into terms. So the query will actually be for 
"file-1-2" and "dat" where "file-1-2" is followed immediately by 
"dat". 
 
In indexing the terms position is stored, so 
"C:\dir1\dir2\file-1-1.dat" becomes... 
[0] c 
[1] dir1 
[2] dir2 
[3] file-1-1 
[4] dat 
 
"file-1-1" is followed by "dat", so there is a match. 
========================================

I think the above explains things.

So, the bottom line was that with Luke, it was using KeywordAnalyzer.

When I switched Luke to using StandardAnalyzer, the Luke query results matched my web query
results.

THANKS!!  I feel better now :)...

Later,
Jim

---- Andrzej Bialecki <ab@getopt.org> wrote: 
> ohaya@cox.net wrote:
> > Hi Phil,
> > 
> > Well, kind of... but...
> > 
> > Then, why, when I do the search in Luke, do I get the results I cited:
> > 
> > xxxx  ==> succeeds
> > 
> > xxxx.yyy  ==> fails (no results)
> > 
> > I guess that I've been assuming that the search in Luke is "correct" and I've been
using that to "test my understanding", but maybe that's an invalid assumption?
> 
> Luke has some bugs, that's for sure, but not as many as one would think 
> ;) I recommend the following exercise:
> 
> * first, check what the "Rewritten" query looks like, in both cases. 
> This could be enlightening, because depending on the choice of default 
> field and query analyzer results could differ dramatically.
> 
> * then, if a query succeeds in matching one or more documents, open this 
> document and view its fields using "Reconstruct & edit", especially the 
> "Tokenized" version of the field. At this point any potential mismatch 
> in query terms vs. analyzed tokens in the field should become apparent.
> 
> -- 
> Best regards,
> Andrzej Bialecki     <><
>   ___. ___ ___ ___ _ _   __________________________________
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message