lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <>
Subject Re: How to Fast Bulk Inserting documents
Date Wed, 19 Aug 2015 18:41:21 GMT
Ir you're sitting on HDFS anyway, you could use MapReduceIndexerTool. I'm not
sure that'll hit your rate, it spends some time copying things around.
If you're not on
HDFS, though, it's not an option.


On Wed, Aug 19, 2015 at 11:36 AM, Upayavira <> wrote:
> On Wed, Aug 19, 2015, at 07:13 PM, Toke Eskildsen wrote:
>> Troy Edwards <> wrote:
>> > My average document size is 400 bytes
>> > Number of documents that need to be inserted 250000/second
>> > (for a total of about 3.6 Billion documents)
>> > Any ideas/suggestions on how that can be done? (use a client
>> > or uploadcsv or stream or data import handler)
>> Use more than one cloud. Make them fully independent. As I suggested when
>> you asked 4 days ago. That would also make it easy to scale: Just measure
>> how much a single setup can take and do the math.
> Yes - work out how much each node can handle, then you can work out how
> many nodes you need.
> You could consider using implicit routing rather than compositeId, which
> means that you take on responsibility for hashing your ID to push
> content to the right node. (Or, if you use compositeId, you could use
> the same algorithm, and be sure that you send docs directly to the
> correct shard.
> At the moment, if you push five documents to a five shard collection,
> the node you send them to could end up doing four HTTP requests to the
> other nodes in the collection. This means you don't need to worry about
> where to post your content - it is just handled for you. However, there
> is a performance hit there. Push content direct to the correct node
> (either using implicit routing, or by replicating the compositeId hash
> calculation in your client) and you'd increase your indexing throughput
> significantly, I would theorise.
> Upayavira

View raw message