lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From markwaddle <m...@markwaddle.com>
Subject Re: Core/shard preference
Date Thu, 22 Oct 2009 05:17:01 GMT

Thank you guys for your responses. That is what I suspected, that it was
going with the first instance of the document that it sees. I tried setting
up Solr in Eclipse and ran into a couple of issues blocking it from
compiling. I also did some reading, but none of the write ups were very
comprehensive. Are there any good write ups that you know of with
instructions on setting up Solr in Eclipse?

Thanks again,
Mark



Yonik Seeley-2 wrote:
> 
> Although shards should be disjoint, Solr "tolerates" duplication
> (won't return duplicates in the main results list, but doesn't make
> any effort to correct facet counts, etc).
> 
> Currently, whichever shard responds first wins.
> The relevant code is around line 420 in QueryComponent.java:
> 
>           String prevShard = uniqueDoc.put(id, srsp.getShard());
>           if (prevShard != null) {
>             // duplicate detected
>             numFound--;
> 
>             // For now, just always use the first encountered since we
> can't currently
>             // remove the previous one added to the priority queue.
> If we switched
>             // to the Java5 PriorityQueue, this would be easier.
>             continue;
>             // make which duplicate is used deterministic based on shard
>             // if (prevShard.compareTo(srsp.shard) >= 0) {
>             //  TODO: remove previous from priority queue
>             //  continue;
>             // }
>           }
> 
> So it's certainly possible to make it deterministic, we just haven't
> done it yet.
> 
> -Yonik
> http://www.lucidimagination.com
> 
> 
> On Mon, Oct 19, 2009 at 7:30 PM, Lance Norskog <goksron@gmail.com> wrote:
>> Distributed Search is designed only for disjoint cores.
>>
>> The document list from each core is returned sorted by the relevance
>> score. The distributed searcher merges these sorted lists. Solr does
>> not implement "distributed IDF", which essentially means distributed
>> coordinated scoring. All scoring happens inside each core, relative to
>> that core's contents. The resulting score numbers are not coordinated
>> with each other, and you will get random results.
>>
>> There is no way to say "use this core's results" because the searches
>> are not compared all at once. Only the page of results fetched is
>> compared, so there's no way to suppress a result in the second page if
>> it was already found in the first.
>>
>> On Mon, Oct 19, 2009 at 3:30 PM, markwaddle <mark@markwaddle.com> wrote:
>>>
>>> I have a small core performing deltas quickly (core00), and a large core
>>> performing deltas slowly (core01), both on the same set of documents.
>>> The
>>> delta core is cleaned nightly. As you can imagine, at times there are
>>> two
>>> versions of a document, one in each core. When I execute a query that
>>> matches this document, sometimes it will come from the delta core, and
>>> some
>>> times it will come from the large core. It almost seems random. Here is
>>> my
>>> query:
>>>
>>> http://porsche:8181/worldip5/core00/select?shards=porsche:8181/worldip5/core00/,porsche:8181/worldip5/core01/&start=0&rows=20&q=hazard+gas+countrycode:JP
>>>
>>> When the delta documents from core00 are returned as desired the access
>>> logs
>>> show:
>>>
>>> 10.36.34.150 - - [19/Oct/2009:15:22:37 -0700] POST
>>> /worldip5/core00/select
>>> HTTP/1.1 200 293 1
>>> 10.36.34.150 - - [19/Oct/2009:15:22:37 -0700] POST
>>> /worldip5/core01/select
>>> HTTP/1.1 200 506 1
>>> 10.36.34.150 - - [19/Oct/2009:15:22:37 -0700] POST
>>> /worldip5/core00/select
>>> HTTP/1.1 200 1151 1
>>> 10.36.34.150 - - [19/Oct/2009:15:22:37 -0700] POST
>>> /worldip5/core01/select
>>> HTTP/1.1 200 2597 1
>>> 10.36.34.151 - - [19/Oct/2009:15:22:37 -0700] GET
>>> /worldip5/core00/select?shards=porsche:8181/worldip5/core00/,porsche:8181/worldip5/core01/&start=0&rows=20&q=hazard+gas+countrycode:JP
>>> HTTP/1.1 200 11881 9
>>>
>>> When the documents are returned from core01 the access logs show:
>>> 10.36.34.150 - - [19/Oct/2009:15:22:37 -0700] POST
>>> /worldip5/core00/select
>>> HTTP/1.1 200 289 1
>>> 10.36.34.150 - - [19/Oct/2009:15:22:37 -0700] POST
>>> /worldip5/core01/select
>>> HTTP/1.1 200 506 1
>>> 10.36.34.150 - - [19/Oct/2009:15:22:37 -0700] POST
>>> /worldip5/core01/select
>>> HTTP/1.1 200 3390 1
>>> 10.36.34.151 - - [19/Oct/2009:15:22:37 -0700] GET
>>> /worldip5/core00/select?shards=porsche:8181/worldip5/core00/,porsche:8181/worldip5/core01/&start=0&rows=20&q=hazard+gas+countrycode:JP
>>> HTTP/1.1 200 11873 9
>>>
>>> Any ideas on why there is a difference in the requests made? Is there a
>>> way
>>> I can tell Solr to prefer the documents in core00?
>>>
>>> Mark
>>> --
>>> View this message in context:
>>> http://www.nabble.com/Core-shard-preference-tp25966791p25966791.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>>
>>
>>
>>
>> --
>> Lance Norskog
>> goksron@gmail.com
>>
> 
> 

-- 
View this message in context: http://www.nabble.com/Core-shard-preference-tp25966791p26004203.html
Sent from the Solr - User mailing list archive at Nabble.com.


Mime
View raw message