lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Naomi Dushay <ndus...@stanford.edu>
Subject Re: range queries on string field with millions of values
Date Fri, 28 Nov 2008 19:44:32 GMT
The point isn't really how the exact sort works - it's the performance  
issues, coupled with an unpredictable distribution along the entire  
possible sort space.

the sort works
the range queries work
the performance sucks

and I haven't thought of a clever work around.

- Naomi

On Nov 27, 2008, at 9:41 AM, Alexander Ramos Jardim wrote:

> I did not even understand what you are considering to be the order  
> on your
> call numbers.
>
> 2008/11/26 Naomi Dushay <ndushay@stanford.edu>
>
>> I have a performance problem and I haven't thought of a clever way  
>> around
>> it.
>>
>> I work at the Stanford University Libraries.  We have a collection  
>> of over
>> 8 million items.  Each item has a call number.  I have been asked  
>> to provide
>> a way to browse forward and backward from an arbitrary call number.
>>
>> I have managed to create a fields that present the call numbers in
>> appropriate sorts, both forward and reverse.  (This is necessary  
>> because raw
>> call numbers don't sort properly:   A123 AZ27 B99 B999 BBB111111).
>>
>> We can ignore the reverse sorted range query problem;  it's the  
>> same as the
>> forward sorted range query.
>>
>> So I use a query like this:
>>
>> sortCallNum["A123 B34 1970" TO *]&rows=10.
>>
>>
>> Call numbers are squirrelly, so we can't predict the string that will
>> appropriately grab at least 10 subsequent documents.  They are  
>> certainly not
>> consecutive!
>>
>> so from
>> A123 B34 1970
>>
>> we're unable to predict if any of these will return at least 10  
>> results:
>>
>> A123 B34 1980  or
>> A123 B34 V.8  or
>> A123 B44 or
>> A123 B67 or
>> A123 C27 or
>> A124* or
>> A22* or
>> AA* or
>>
>> You get the idea.
>>
>> I have not figured out a way to efficiently query for "the next 10  
>> call
>> numbers in sort order".  I have also mucked about with the cache
>> initialization, but that's not working either:
>>
>>   <listener event="firstSearcher" class="solr.QuerySenderListener">
>>     <arr name="queries">
>>       <!-- populate query result cache for sorted queries -->
>>       <lst>
>>               <str name="q">shelfkey:[0 TO *]</str>
>>               <str name="sort">shelfkey asc</str>
>>       </lst>
>>     </arr>
>>
>> Can anyone help me with this?
>>
>> - Naomi
>>
>>
>
>
> -- 
> Alexander Ramos Jardim

Naomi Dushay
ndushay@stanford.edu




Mime
View raw message