lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: sort field should not be tokenized?
Date Wed, 09 Jun 2010 22:56:57 GMT
Consider analyzing on whitespace, without
removing stopwords for the input "the fox is in
his den". You'd have the terms:
the
fox
is
in
his
den

What does it mean to sort on this field? Which term
should be used?

What if you remove stopwords? What about casing?
Or any of a myriad of other possible things you'd to
with an analyzer.

So the behavior *can* work if you sort on a tokenized
field, but it'll be "interesting". If you happen to have
a field that only tokenizes to single terms, you'll
probably get expected results, but it'll be pretty
fragile..

HTH
Erick

On Wed, Jun 9, 2010 at 11:35 AM, fujian <fujian.z.yang@nokia.com> wrote:

>
>
> Hello,
>
> I'm using Lucene 2.9 and when reading java doc for the Sort class I noticed
> it says "The field must be indexed, but should not be tokenized".
>
> But I tried to sort on a tokenized field, it works too. Just wondering
> what's the difference between tokenized and untokenized in terms of sort?
> Why in javadoc and "Lucene in Action" they all mention that the sort field
> should not be tokenzied?
>
> Thanks,
> -Fujian
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/sort-field-should-not-be-tokenized-tp882569p882569.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message