lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Casey Dement <cdem...@weather.com>
Subject Re: Query works in Luke but not in code...
Date Mon, 02 Jun 2008 13:22:08 GMT
Okay, I figured out my issue (well, actually a coworker spotted it - I was
just too close).  A word of warning:

Token "termBuffer" character arrays are fixed size, not sized to the number
of characters!

Yep, I was dropping the term buffer into a String without start and length,
thereby adding unseen characters to the end of the String that just monkey
up everything :)  This is of course mentioned in the documentation - I just
overlooked that little detail...


On 5/28/08 5:10 PM, "Casey Dement" <cdement@weather.com> wrote:

> LOL - I sure wish it was! :)
> 
> Sadly, that was a typo (Luke, for all its beauties, does not seem to grasp
> the concept of a clipboard so the sample was a manual transcription).
> 
> A few more details - don't know if this will help or not.
> 
> Same query as before, when I do a rewrite of the query in Luke I get back a
> set of 44 matching tokens for the given "Austell" ranging between a boost of
> 0.1429 for "turrel" up to 1.0 for "austell".  That the token "austell" gets
> a 1.0 is rather obvious since it's a perfect match...
> 
> BUT - when I rewrite the query in my local code, I get 2 matches, austell
> (at 0.14285707) and (austerlitz at 0.20000005).  This is disturbing on two
> fronts - first of all where are the other 42?  And secondly - why is the
> exact match evaluating to such a low boost?
> 
> Does that point at all to where I'm going astray?
> 
> Thanks!
> 
>  Casey 
> 
> 
> On 5/23/08 5:00 AM, "Ian Lea" <ian.lea@gmail.com> wrote:
> 
>>> ...
>>> And expect to match document 156297 (search_text=="Austell GA", type==1).
>>> ...
>>>  System.out.println(searcher.explain(query, 156296));
>> 
>> 156297 != 156296
>> 
>> Could that be it?
>> 
>> 
>> --
>> Ian.
>> 
>> 
>> On Thu, May 22, 2008 at 11:21 PM, Casey Dement <cdement@weather.com> wrote:
>>> Hi - trying to execute a search in Lucene and getting results I don't
>>> understand :(
>>> 
>>> The index contains fields search_text and type - both indexed tokenized.
>>> I'm attempting to execute the query:
>>> 
>>>  +(search_text:austell~0.9 search_text:ga~0.9) +(type:1 type:4)
>>> 
>>> And expect to match document 156297 (search_text=="Austell GA", type==1).
>>> 
>>> I am executing this query both directly in code and via the tool Luke - but
>>> getting WILDLY different answers.  In Luke, the expected document is found
>>> no problem, but in my own code I find no results.  Obviously I suspect my
>>> code of being crap ;)
>>> 
>>> Oh, FYI, in both my local code and Luke I am using a StandardAnalyzer and
>>> the default column is "search_text".
>>> 
>>> Here's what I'm doing:
>>> 
>>>  /******************************************************************/
>>>  File location = new File("/the/correct/path");
>>>  IndexReader index = IndexReader.open(location);
>>>  Searcher searcher = new IndexSearcher(index);
>>>  QueryParser parser = new QueryParser("search_text", new
>>> StandardAnalyzer());
>>>  Query query = parser.parse("+(search_text:austell~0.9 search_text:ga~0.9)
>>> +(type:1 type:4)");
>>>  System.out.println(searcher.explain(query, 156296));
>>>  /******************************************************************/
>>> 
>>> When I run this, I get:
>>> |  0.0000 = (NON-MATCH) Failure to meet condition(s) of required/prohibited
>>> clause(s)
>>> |    0.0000 = no match on required clause (() ())
>>> |      0.0000 = (NON-MATCH) product of:
>>> |        0.0000 = (NON-MATCH) sum of:
>>> |        0.0000 = coord(0/2)
>>> |    0.2133 = (MATCH) product of:
>>> |      0.4267 = (MATCH) sum of:
>>> |        0.4267 = (MATCH) weight(type:1 in 156296), product of:
>>> |          0.3672 = queryWeight(type:1), product of:
>>> |            1.1618 = idf(docFreq=315734, numDocs=371197)
>>> |            0.3161 = queryNorm
>>> |          1.1618 = (MATCH) fieldWeight(type:1 in 156296), product of:
>>> |            1.0000 = tf(termFreq(type:1)=1)
>>> |            1.1618 = idf(docFreq=315734, numDocs=371197)
>>> |            1.0000 = fieldNorm(field=type, doc=156296)
>>> |      0.5000 = coord(1/2)
>>> 
>>> So obviously I'm loading the index (since it did match the "type") - but it
>>> seems to be COMPLETELY ignoring the criteria on "search_text".
>>> 
>>> When I run this exact same string in Luke, I get:
>>> |  8.0079 = (MATCH) sum of:
>>> |    7.9578 = (MATCH) sum of:
>>> |      5.4904 = (MATCH) weight(search_text:austell in 156297), product of:
>>> |        0.8074 = queryWeight(search_text:austell), product of:
>>> |          10.8800 = idf(docFreq=18, numDocs=371197)
>>> |          0.0742 = queryNorm
>>> |        6.8000 = (MATCH) fieldWeight(search_text:austell in 156297),
>>> product of:
>>> |          1.0000 = tf(termFreq(search_text:austell)=1)
>>> |          10.8800 = idf(docFreq=18, numDocs=371197)
>>> |          0.6250 = fieldNorm(field=search_text, doc=156297)
>>> |      2.4673 = (MATCH) weight(search_text:ga in 156297), product of:
>>> |        0.5413 = queryWeight(search_text:ga), product of:
>>> |          7.2936 = idf(docFreq=685, numDocs=371197)
>>> |          0.0742 = queryNorm
>>> |        4.5585 = (MATCH) fieldWeight(search_text:ga in 156297), product of:
>>> |          1.0000 = tf(termFreq(search_text:ga)=1)
>>> |          7.2936 = idf(docFreq=685, numDocs=371197)
>>> |          0.6250 = fieldNorm(field=search_text, doc=156297)
>>> |    0.0501 = (MATCH) product of:
>>> |      0.1002 = (MATCH) sum of:
>>> |        0.1002 = (MATCH) weight(type:1 in 156296), product of:
>>> |          0.0862 = queryWeight(type:1), product of:
>>> |            1.1618 = idf(docFreq=315734, numDocs=371197)
>>> |            0.0742 = queryNorm
>>> |          1.1618 = (MATCH) fieldWeight(type:1 in 156296), product of:
>>> |            1.0000 = tf(termFreq(type:1)=1)
>>> |            1.1618 = idf(docFreq=315734, numDocs=371197)
>>> |            1.0000 = fieldNorm(field=type, doc=156296)
>>> |      0.5000 = coord(1/2)
>>> 
>>> Which while clearly looking at the same document ID in the same index is
>>> conversely working perfectly!
>>> 
>>> Does anybody have any idea where I am screwing up?  Thanks!
>>> 
>>> Casey
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message