lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Keane <pke...@mail.utexas.edu>
Subject Re: Solr Search probem w/ phrase searches, text type, w/ escaped characters
Date Mon, 03 Aug 2009 22:23:05 GMT
Thanks!

Any idea why

Miguel : three dimensions : [Exhibitio

parse to: miguel, three,dimensions, exhibitio

BUT

Miguel : three dimensions : [Exhibition]

parses to miguel, three, dimensions, null_1, exhibition

seems quite strange...

--peter


On Mon, Aug 3, 2009 at 4:02 PM, Andrzej Bialecki <ab@getopt.org> wrote:

> Peter Keane wrote:
>
>> I've used Luke to figure out what is going on, and I see in the fields
>> that
>> fail to match, a "null_1".  Could someone tell me what that is?  I see
>> some
>> null_100s there as well, which see to separate field values.  Clearly the
>> null_1s are causing the search to fail.
>>
>
> You used the "Reconstruct" function to obtain the field values for unstored
> fields, right? null_NNN is Luke's way of telling you that the tokens that
> should be on these positions are absent, because they were removed by
> analyzer during indexing, and there is no stored value of this field from
> which you could recover the original text. In other words, they are holes in
> the token stream, of length NNN.
>
> Such holes may be also produced by artificially increasing the token
> positions, hence the null_100 that serves to separate multiple field values
> so that e.g. phrase queries don't match unrelated text.
>
> Phrase queries that you can construct using QueryParser can't match two
> tokens separated by a hole, unless you set a slop value > 0.
>
> --
> Best regards,
> Andrzej Bialecki     <><
>  ___. ___ ___ ___ _ _   __________________________________
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message