lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Wartes <jwar...@whitepages.com>
Subject Re: Indexing 700 docs per second
Date Tue, 19 Apr 2016 19:06:46 GMT

I have no numbers to back this up, but I’d expect Atomic Updates to be slightly slower than
a full update, since the atomic approach has to retrieve the fields you didn't specify before
it can write the new (updated) document.




On 4/19/16, 11:54 AM, "Tim Robertson" <timrobertson100@gmail.com> wrote:

>Hi Mark,
>
>We were putting in and updating docs of around 20-25 indexed fields (mainly
>INTs, but some Strings and multivalue fields) at >1000/sec on far lesser
>hardware and a total of 600 million docs (batch updates of course) while
>also serving live queries for a website which had about 30 concurrent users
>steady state (not all hitting SOLR though).
>
>It seems realistic with that kind of hardware in my experience, but you
>didn't mention what else was going on that might affect it (e.g. reads).
>
>HTH,
>Tim
>
>
>On Tue, Apr 19, 2016 at 7:12 PM, Erick Erickson <erickerickson@gmail.com>
>wrote:
>
>> Make very sure you batch updates though.
>> Here's a benchmark I ran:
>> https://lucidworks.com/blog/2015/10/05/really-batch-updates-solr-2/
>>
>> NOTE: it's not entirely clear that you want to
>> put 122M docs on a single shard. Depending on the queries
>> you'll run you may want 2 or more shards, but that depends
>> on the query pattern and your SLAs. Here's the long version
>> of "you really have to load test this":
>>
>> https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
>>
>> Best,
>> Erick
>>
>> On Tue, Apr 19, 2016 at 6:48 AM, Susheel Kumar <susheel2777@gmail.com>
>> wrote:
>> >  It sounds achievable with your machine configuration and i would suggest
>> > to try out atomic update.  Use SolrJ with multi-threaded indexing for
>> > higher indexing rate.
>> >
>> > Thanks,
>> > Susheel
>> >
>> >
>> >
>> > On Tue, Apr 19, 2016 at 9:27 AM, Tom Evans <tevans.uk@googlemail.com>
>> wrote:
>> >
>> >> On Tue, Apr 19, 2016 at 10:25 AM, Mark Robinson <
>> mark123learns@gmail.com>
>> >> wrote:
>> >> > Hi,
>> >> >
>> >> > I have a requirement to index (mainly updation) 700 docs per second.
>> >> > Suppose I have a 128GB RAM, 32 CPU machine, with each doc size around
>> 260
>> >> > byes (6 fields out of which only 2 will undergo updation at the above
>> >> > rate). This collection has around 122Million docs and that count is
>> >> pretty
>> >> > much a constant.
>> >> >
>> >> > 1. Can I manage this updation rate with a non-sharded ie single Solr
>> >> > instance set up?
>> >> > 2. Also is atomic update or a full update (the whole doc) of the
>> changed
>> >> > records the better approach in this case.
>> >> >
>> >> > Could some one please share their views/ experience?
>> >>
>> >> Try it and see - everyone's data/schemas are different and can affect
>> >> indexing speed. It certainly sounds achievable enough - presumably you
>> >> can at least produce the documents at that rate?
>> >>
>> >> Cheers
>> >>
>> >> Tom
>> >>
>>
Mime
View raw message