lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <>
Subject Re: sort field should not be tokenized?
Date Wed, 09 Jun 2010 22:56:57 GMT
Consider analyzing on whitespace, without
removing stopwords for the input "the fox is in
his den". You'd have the terms:

What does it mean to sort on this field? Which term
should be used?

What if you remove stopwords? What about casing?
Or any of a myriad of other possible things you'd to
with an analyzer.

So the behavior *can* work if you sort on a tokenized
field, but it'll be "interesting". If you happen to have
a field that only tokenizes to single terms, you'll
probably get expected results, but it'll be pretty


On Wed, Jun 9, 2010 at 11:35 AM, fujian <> wrote:

> Hello,
> I'm using Lucene 2.9 and when reading java doc for the Sort class I noticed
> it says "The field must be indexed, but should not be tokenized".
> But I tried to sort on a tokenized field, it works too. Just wondering
> what's the difference between tokenized and untokenized in terms of sort?
> Why in javadoc and "Lucene in Action" they all mention that the sort field
> should not be tokenzied?
> Thanks,
> -Fujian
> --
> View this message in context:
> Sent from the Lucene - Java Users mailing list archive at
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message