lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Naomi Dushay <ndus...@stanford.edu>
Subject range queries on string field with millions of values
Date Wed, 26 Nov 2008 22:43:52 GMT
I have a performance problem and I haven't thought of a clever way  
around it.

I work at the Stanford University Libraries.  We have a collection of  
over 8 million items.  Each item has a call number.  I have been asked  
to provide a way to browse forward and backward from an arbitrary call  
number.

I have managed to create a fields that present the call numbers in  
appropriate sorts, both forward and reverse.  (This is necessary  
because raw call numbers don't sort properly:   A123 AZ27 B99 B999  
BBB111111).

We can ignore the reverse sorted range query problem;  it's the same  
as the forward sorted range query.

So I use a query like this:

sortCallNum["A123 B34 1970" TO *]&rows=10.


Call numbers are squirrelly, so we can't predict the string that will  
appropriately grab at least 10 subsequent documents.  They are  
certainly not consecutive!

so from
A123 B34 1970

we're unable to predict if any of these will return at least 10 results:

A123 B34 1980  or
A123 B34 V.8  or
A123 B44 or
A123 B67 or
A123 C27 or
A124* or
A22* or
AA* or

You get the idea.

I have not figured out a way to efficiently query for "the next 10  
call numbers in sort order".  I have also mucked about with the cache  
initialization, but that's not working either:

     <listener event="firstSearcher" class="solr.QuerySenderListener">
       <arr name="queries">
       	<!-- populate query result cache for sorted queries -->
         <lst>
         	<str name="q">shelfkey:[0 TO *]</str>
       		<str name="sort">shelfkey asc</str>
         </lst>
       </arr>

Can anyone help me with this?

- Naomi


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message