lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Fuad Efendi" <f...@efendi.ca>
Subject RE: SOLR Performance Tuning: Pagination
Date Fri, 25 Dec 2009 02:06:40 GMT
Hi Walter, you are right, it were mostly robots (Googlebot, Yahoo/Slurp,
etc);

I have friendly URLs like 
http://www.tokenizer.org/USA/?page=7 (30mlns docs, 3mlns pages)
http://www.tokenizer.org/www.newegg.com/
http://www.tokenizer.org/www.newegg.com/?sort=link&dir=asc&q=Opteron

And even this:
http://www.tokenizer.org/AMD/Opteron/8350/

I disabled processing for URLs with no query parameter (empty results); but
I should really limit pagination programmatically... fortunately
http://www.tokenizer.org/?q=USA returns 50k documents (search doesn't use
"Country" field). But some queries may return huge nuber of documents
(better is to tune  "stop-word" list)

-Fuad


> -----Original Message-----
> From: Walter Underwood [mailto:wunder@wunderwood.org]
> Sent: December-24-09 1:51 PM
> To: solr-user@lucene.apache.org
> Subject: Re: SOLR Performance Tuning: Pagination
> 
> Some bots will do that, too. Maybe badly written ones, but we saw that at
> Netflix. It was causing search timeouts just before a peak traffic period,
> so we set a page limit in the front end, something like 200 pages.
> 
> It makes sense for that to be very slow, because a request for hit
> 28838540 means that Solr has to calculate the relevance for 28838540 + 10
> documents.
> 
> Fuad: Why are you benchmarking this? What user is looking at 20M
> documents?
> 
> wunder
> 
> On Dec 24, 2009, at 10:44 AM, Erik Hatcher wrote:
> 
> >
> > On Dec 24, 2009, at 11:36 AM, Walter Underwood wrote:
> >> When do users do a query like that? --wunder
> >
> > Well, SolrEntityProcessor "users" do :)
> >
> >  http://issues.apache.org/jira/browse/SOLR-1499
> >  (which by the way I plan on polishing and committing over the holidays)
> >
> > 	Erik
> >
> >
> >
> >>
> >> On Dec 24, 2009, at 8:09 AM, Fuad Efendi wrote:
> >>
> >>> I used pagination for a while till found this...
> >>>
> >>>
> >>> I have filtered query ID:[* TO *] returning 20 millions results (no
> >>> faceting), and pagination always seemed to be fast. However, fast only
> with
> >>> low values for start=12345. Queries like start=28838540 take 40-60
> seconds,
> >>> and even cause OutOfMemoryException.
> >>>
> >>> I use highlight, faceting on nontokenized "Country" field, standard
> handler.
> >>>
> >>>
> >>> It even seems to be a bug...
> >>>
> >>>
> >>> Fuad Efendi
> >>> +1 416-993-2060
> >>> http://www.linkedin.com/in/liferay
> >>>
> >>> Tokenizer Inc.
> >>> http://www.tokenizer.ca/
> >>> Data Mining, Vertical Search
> >>>
> >>
> >




Mime
View raw message