lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: SolrReplication configuration with frequent deletes and updates
Date Thu, 02 Feb 2012 00:38:08 GMT
In addition to what Emmanuel mentioned, why not consider 7 shards? If
you used one shard/day, your delete problem becomes really easy,
just nuke the oldest shard....

Although beware that this solution may affect your TF/IDF calculations
on the new shard (i.e. the one you use for *today's* data) until you get
enough documents on it.

Best
Erick

On Wed, Feb 1, 2012 at 2:05 PM, Emmanuel Espina
<espinaemmanuel@gmail.com> wrote:
> 2012/2/1 prasenjit mukherjee <prasen.bea@gmail.com>:
>> I have the following requirements :
>>
>> 1. Adds : 20 docs/sec
>> 2. Searches : 100 searches/sec
>> 3. Deletes : (20*3600*24*7 ~ 12 mill ) docs/week ( basically a cron
>> job which deletes all documents more than 7 days old )
>>
>> I am thinking of having 6 shards ( with each having 2 million docs )
>> with 1 master and 2 slaves with SolrReplication. Have following
>> questions :
>>
>> 1. With  50 searches/sec per shard with 2 million doc, what would be
>> the tentative response-time  ?  I am thinking of keeping it under <100
>> ms
>
> That are quite a lot of searches per second considering that you will
> have to search in 6 shards (the coordination and network latency
> affects the results). Also the components you use and the complexity
> of the query (as well as the number of segments in each shard) affects
> the time. 100 ms is probably a low threshold for your requirements,
> you will probably need to add more replicas.
>
>
>> 2. What would be a reasonable latency ( pollInterval ) on slave for
>> SolrReplication ( all slaves connected with a single backplane ). Is 1
>> minute pollInterval reasonable ?
>
> Yes, but it is not reasonable that each time you poll you get updates.
> That is, you shouldn't perform commits more than once every 10
> minutes. Otherwise we would be talking of near real time indexing,
> something that is in development in trunk
> http://wiki.apache.org/solr/NearRealtimeSearch
>
>
>> 3. Is NRT a better/viable option compared to SolrReplication ?
>
> That is something in development. AFAIK it works with shards (because
> nrt refers to indexing and with shards there isn't anything particular
> with the indexing) but with replication something different will be
> needed: SolrCloud I think covers these nrt aspects due to its
> different architecture (not master-slave that in replicas but all
> peers replicating)
>
>>
>> -Thanks,
>> Prasenjit

Mime
View raw message