lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Naomi Dushay <ndus...@stanford.edu>
Subject Re: range queries on string field with millions of values
Date Sat, 29 Nov 2008 00:41:18 GMT
Gosh,  I'm sorry to be so unclear.  Hmm.  Trying to clarify below:

On Nov 28, 2008, at 3:52 PM, Chris Hostetter wrote:

> Having read through this thread, i'm not sure i understand what  
> exactly
> the problem is.  my naive understanding is...
>
> 1) you want to sort by a field
> 2) you want to be able to "paginate" through all docs in order of this
> field.
> 3) you want to be able to start your pagination at any arbitrary  
> value for
> this field.
>
> so (assuming the field is a simple number for now) you could us  
> something
> like
>
>   q=yourField:[42 TO *&sort=yourField+asc&rows=10&start-0
>
> where "42" is the arbitrary ID someone wants to start at.
>

perfect.  This is the query I'm using.

The results are correct.  But the response time sucks.

Reading the docs about caches, I thought I could populate the query  
result cache with an autowarming query and the response time would be  
okay.  But that hasn't worked.  (See excerpts from my solrConfig file  
below.)

A repeated query is very fast, implying caching happens for a  
particular starting point ("42" above).

Is there a way to populate the cache with the ENTIRE sorted list of  
values for the field, so any arbitrary starting point will get results  
from the cache, rather than grabbing all results from (x) to the end,  
then sorting all these results, then returning the first 10?


> This sentence below seems to imply that you have a solution which  
> produces
> correct results, but doesn't produce results quickly...

right.

> : I have a performance problem and I haven't thought of a clever way  
> around it.
>
> ...however this lines seems to suggest that you're having trouble
> getting at least 10 results from any query (?)
>
> : Call numbers are squirrelly, so we can't predict the string that  
> will
> : appropriately grab at least 10 subsequent documents.  They are  
> certainly not
> : consecutive!
> :
> : so from
> : A123 B34 1970
> :
> : we're unable to predict if any of these will return at least 10  
> results:

I was trying to express that I couldn't do this:

myfield:[X TO Y]

because I can't algorithmically compute Y.

Glen Newton suggested a work around, whereby I represent my  
squirrelly, but sortable, field values as floating point numbers, and  
then I can compute Y.

> ...but i'm not sure what exactly that means.  for any given field,  
> there
> is always going to be some values X such that myField:[X TO *] won't
> return at least 10 docs ... the are the last values in the index in  
> order
> -- surely it's okay for your app to have an "end" state when you run  
> out 
> of data? :)

yes.  Understood.  This is not an issue.

> Oh, and BTW...
>
> : numbers in sort order".  I have also mucked about with the cache
> : initialization, but that's not working either:
> :
> :     <listener event="firstSearcher"  
> class="solr.QuerySenderListener">
>
> ...make sure you also do a newSearcher listener that does the same  
> thing,
> otherwise your FieldCache (used for sorting) may not be warmed when
> commits happen)

Yup yup yup.

from solrconfig:

     <filterCache
       class="solr.LRUCache"
       size="20000000"
       initialSize="10000000"
       autowarmCount="500000"/>

     <queryResultCache
       class="solr.LRUCache"
       size="10000000"
       initialSize="5000000"
       autowarmCount="5000000"/>


     <listener event="newSearcher" class="solr.QuerySenderListener">
       <arr name="queries">
       	<!-- populate query result cache for sorted queries -->
         <lst>
         	<str name="q">shelfkey:[0 TO *]</str>
       		<str name="sort">shelfkey asc</str>
         </lst>
       </arr>
     </listener>

     <listener event="firstSearcher" class="solr.QuerySenderListener">
       <arr name="queries">
       	<!-- populate query result cache for sorted queries -->
         <lst>
         	<str name="q">shelfkey:[0 TO *]</str>
       		<str name="sort">shelfkey asc</str>
         </lst>



Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message