lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From prasenjit mukherjee <prasen....@gmail.com>
Subject Re: SolrReplication configuration with frequent deletes and updates
Date Thu, 02 Feb 2012 02:41:31 GMT
Appreciate your reply. Have some more follow up questions inline.

On Thu, Feb 2, 2012 at 12:35 AM, Emmanuel Espina
<espinaemmanuel@gmail.com> wrote:
>> 1. Adds : 20 docs/sec
>> 2. Searches : 100 searches/sec
>> 3. Deletes : (20*3600*24*7 ~ 12 mill ) docs/week ( basically a cron
>> job which deletes all documents more than 7 days old )
>>
>> I am thinking of having 6 shards ( with each having 2 million docs )
>> with 1 master and 2 slaves with SolrReplication. Have following
>> questions :
>>
>> 1. With  50 searches/sec per shard with 2 million doc, what would be
>> the tentative response-time  ?  I am thinking of keeping it under <100
>> ms
>
> That are quite a lot of searches per second considering that you will
> have to search in 6 shards (the coordination and network latency
> affects the results). Also the components you use and the complexity
> of the query (as well as the number of segments in each shard) affects
> the time. 100 ms is probably a low threshold for your requirements,
> you will probably need to add more replicas.

Adding slaves ( using SolrReplication ) is fine as long as it scales
linear. I do understand that shards may not scale linearly, mostly
because of merging/network overhead, but  think will help in reducing
response time ( pls correct me if I am wrong ) .  I am more worried
about response time ( even on a lightly loaded slave ). The main
intention of sharding was to reduce the response time. Will it be
better to have a 2shardsX6slaves configuration compared to
6shardX2slaves ? Considering my total# docs is 12 million, wIll solr
be ok with 6 million docs/shard ?

>
>
>> 2. What would be a reasonable latency ( pollInterval ) on slave for
>> SolrReplication ( all slaves connected with a single backplane ). Is 1
>> minute pollInterval reasonable ?
>
> Yes, but it is not reasonable that each time you poll you get updates.
> That is, you shouldn't perform commits more than once every 10
> minutes. Otherwise we would be talking of near real time indexing,
> something that is in development in trunk
> http://wiki.apache.org/solr/NearRealtimeSearch

Hmm. 10 minutes latency is definitely too hight for me ( specially as
this is a streaming use case, i.e. show latest stuff first )  In that
case I can probably get rid of master-slave and update all the
replicated shards. But then I will have to do lot of leg-work ( what
if one of the slaves are down etc. etc. ) I was trying to avoid that.
Just curious to know what is the stability of  NRT ?

>
>
>> 3. Is NRT a better/viable option compared to SolrReplication ?
>
> That is something in development. AFAIK it works with shards (because
> nrt refers to indexing and with shards there isn't anything particular
> with the indexing) but with replication something different will be
> needed: SolrCloud I think covers these nrt aspects due to its
> different architecture (not master-slave that in replicas but all
> peers replicating)

So it seems SolrReplication is out ( if my pollInteterval < 5 minute
), right ? Let me look into SolrCloud. Any suggestions which one is
more stable SolrCloud/NRT ?

-Thanks,
Prasenjit

Mime
View raw message