lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrzej Bialecki ...@getopt.org>
Subject Re: Solr Search probem w/ phrase searches, text type, w/ escaped characters
Date Mon, 03 Aug 2009 21:02:35 GMT
Peter Keane wrote:
> I've used Luke to figure out what is going on, and I see in the fields that
> fail to match, a "null_1".  Could someone tell me what that is?  I see some
> null_100s there as well, which see to separate field values.  Clearly the
> null_1s are causing the search to fail.

You used the "Reconstruct" function to obtain the field values for 
unstored fields, right? null_NNN is Luke's way of telling you that the 
tokens that should be on these positions are absent, because they were 
removed by analyzer during indexing, and there is no stored value of 
this field from which you could recover the original text. In other 
words, they are holes in the token stream, of length NNN.

Such holes may be also produced by artificially increasing the token 
positions, hence the null_100 that serves to separate multiple field 
values so that e.g. phrase queries don't match unrelated text.

Phrase queries that you can construct using QueryParser can't match two 
tokens separated by a hole, unless you set a slop value > 0.

-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


Mime
View raw message