lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Trejkaz <trej...@trypticon.org>
Subject Re: Strange change to query parser behaviour in recent versions
Date Mon, 22 Aug 2011 01:26:48 GMT
On Sat, Aug 20, 2011 at 7:00 PM, Robert Muir <rcmuir@gmail.com> wrote:
> On Sat, Aug 20, 2011 at 3:34 AM, Trejkaz <trejkaz@trypticon.org> wrote:
>
>>
>> As an aside, Google's behaviour seems to follow the "old" way.  For
>> instance, [[ 限定 ]] returns 640,000,000 hits and [[ 限 定 ]] returns
>> 772,000,000.  (Interestingly, [[ "限定" ]] returns 643,000,000 hits.
>> Slightly more than you might expect.)
>>
>
> No it doesn't. query on 北京医科大学
>
> You are confusing tokenization with query-generation itself: if you
> want 限定 to be treated as a compound then use a tokenizer that does
> this.

Nope.  I'm not confusing the two, I just haven't seen the source code
for Google, so I can't say which level it was doing it at.  For my
example it seemed pretty opaque.

That's a good example, though.

TX

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message