lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tomas Zerolo <tomas.zer...@axelspringer.de>
Subject Re: Poor performance on distributed search
Date Tue, 20 Dec 2011 08:01:39 GMT
On Mon, Dec 19, 2011 at 01:32:22PM -0800, ku3ia wrote:
> >>Uhm, either I misunderstand your question or you're doing 
> >>a lot of extra work for nothing.... 
> 
> >>The whole point of sharding it exactly to collect the top N docs 
> >>from each shard and merge them into a single result [...]

> >>>>P.S. Is any mechanism, for example, to get top 100 rows from each shard,
> only merge it, sort by defined at query filed or score and pull result to
> the user?
> >>Uhm, either I misunderstand your question
> For example I have 4 shards. Finally, I need 2000 docs. Now, when I'm using
> &shards=127.0.0.1:8080/solr/shard1,127.0.0.1:8080/solr/shard2,127.0.0.1:8080/solr/shard3,127.0.0.1:8080/solr/shard4
> Solr gets 2000 docs from each shard (shard1,2,3,4, summary we have 8000
> docs) merge and sort it, for example, by default field (score), and returns
> me only 2000 rows (not all 8000), which I specified at request.
> So, my question was about, is any mechanism in Solr, which gets not 2000
> rows from each shard, and say, If I specified 2000 docs at request, Solr
> calculates how much shards I have (four shards), divides total rows onto
> shards (2000/4=500) and sends to each shard queries with rows=500, but not
> rows=2000, so finally, summary after merging and sorting I'll have 2000 rows
> (maybe less), but not 8000... That was my question.

But then the results would be wrong? Suppose the documents are not evenly
distributed (wrt the sort criterium) across all the shards. In an extreme
case, just imagine all 2000 top-most documents are on shard 3. You would get
the 500 top-most (from shard 3) and some other you don't want (from the
other shards). You wouldn't even know.

What SOLR is doig here is planning for the worst case.

Now if it could just do some piece-wise "merge sort" of sorts, that would be
better.

-- 
Tomás Zerolo
Axel Springer AG
Axel Springer media Systems
BILD Produktionssysteme
Axel-Springer-Straße 65
10888 Berlin
Tel.: +49 (30) 2591-72875
tomas.zerolo@axelspringer.de
www.axelspringer.de

Mime
View raw message