accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <josh.el...@gmail.com>
Subject Re: Ingest speed
Date Tue, 05 May 2015 15:49:34 GMT
On a single node, you can easily achieve 10s of thousands of key-value 
inserts per second. Depending on how many columns are in each row, 600 a 
second is rather slow :)

Your loop looks good. Using a single BatchWriter and letting it amortize 
sending data from your client to the servers will be the most efficient.

If the JSON parsing is the slowest part, you could consider a single 
thread reading the file and provide the line to a thread pool, parse the 
line and add it to some concurrent data structure. You could have a 
consumer on that data structure reading each parsed object and sending 
it to Accumulo.

Alternatively, this is where MapReduce is a clear win as it's very good 
at parallelizing these types of problems. You could use the 
FileInputFormat and the AccumuloOutputFormat to accomplish this task.

Andrea Leoni wrote:
> Thank you for your answer.
> Today i tried to create a big command file and push it to shell (about 300k
> insert per file). As you said it is too slow for me (about 600 inserted
> row/sec)
>
> I'm on Accumulo by just one week. I'm a noob but i'm learning.
>
> Actually my app has to store a large number of data.
>
> The row is the timestamp and the family/qualif are the column... I catch my
> data from a JSON file, so my app scan it for new records, parse it and once
> for record create a mutation and push it on Accumulo with batchWriter...
>
> Maybe I wrong something that can increase the speed of my inserts.
>
> Actually I:
>
> LOOP
> 1) read a json line
> 2) parse it
> 3) create a mutation
> 4) put in this mutation the line's information
> 5) use batchWriter to insert mutation in Accumulo
> END LOOP
>
> Is it all right? I now that point 1) and 2) are slow but it's necessary and
> i use the fastest json parser i've found online.
>
> Thank you so much again!
> (and sorry again for my bad english!)
>
>
>
> -----
> Andrea Leoni
> Italy
> Computer Engineering
> --
> View this message in context: http://apache-accumulo.1065345.n5.nabble.com/Ingest-speed-tp14005p14013.html
> Sent from the Developers mailing list archive at Nabble.com.

Mime
View raw message