lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andre Rubin" <andre.ru...@gmail.com>
Subject Re: Sorting
Date Tue, 05 Aug 2008 20:12:04 GMT
Sounds like a good alternative. But how do I perform the search on the
tokenized filed and sort on the un_tokenized one?

Thanks,


Andre

On Tue, Aug 5, 2008 at 12:51 PM,  <Robert.Hastings@ancept.com> wrote:
> This is what I did and it works fine.  My untokenized fields where named:
> "__AMSUNTOK__" + fieldName.
> Where fieldName was the name of the tokenized field.
>
>
> Bob Hastings
> Ancept Inc.
>
>
>
>
> Mark Miller <markrmiller@gmail.com>
> 08/05/2008 02:38 PM
> Please respond to
> java-user@lucene.apache.org
>
>
> To
> java-user@lucene.apache.org
> cc
>
> Subject
> Re: Sorting
>
>
>
>
>
>
> Hey Andre,
>
> The reason the javadoc says the field should not be tokenized stems from
> the issue you point out. What you want to do is possible of course, but
> making the Lucene code change would complicate a process that can be
> quite memory and cpu intensive on large collections. Done right, it
> might make a good patch though.
>
> A compromise that you can make outside of the Lucene code is to index a
> separate field with the same contents but untokenized. Sorting on this
> field instead, Lucene will treat "North Carolina" as one token and sort
> as you'd expect. The downside to this approach is that you will have to
> juggle the two fields in the future.
>
> - Mark
>
> Andre Rubin wrote:
>> Hi there!
>>
>> I'm new to Lucene, so forgive any misconceptions on my part.
>>
>> I created an Index and now I want to search on it based on a field.
>> The field is a String field and Field.Store.YES and
>> Field.Index.TOKENIZED. No problems with the search.
>>
>> Now, I wanted to sort the results, and according to the Sort javadoc
>> the field "should not be tokenized". But I decided to try it anyway,
>> and it worked. However, the results showed that the tokens were
>> sorted, not the full string in the field.
>>
>> Just to make myself more clear, here's an example. Let's say I have
>> these strings indexed:
>>
>> "North Carolina"
>> "British Columbia"
>> "Canada"
>>
>> Now I search (with sort) for the token 'c*'
>>
>> The result I get is (sorted by the token found):
>>
>> 1) Canada
>> 2) North Carolina
>> 3) British Columbia
>>
>> The result I wanted was (sorted by the whole String)"
>>
>> 1) British Columbia
>> 2) Canada
>> 3) North Carolina
>>
>> Is there a way to do this?
>>
>>
>> Another option would be to sort the index itself, since this field is
>> the only field that we'd be searching on. But I'm just guessing here,
>> cause I have no idea if this is possible at all!
>>
>> Thanks,
>>
>>
>> Andre
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message