accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <>
Subject Re: Bulk import
Date Wed, 12 Oct 2016 03:14:53 GMT
Even 10G is a rather small amount of data. Setting up a bulk loading 
framework is a bit more complicated than it appears at first glance. 
Take your pick of course, but I probably wouldn't consider bulk loading 
unless you were regularly processing 10-100x that amount of data :) wrote:
> The bulk import seemed to be a good option since the bson file generated about 10g data.
The problem with my code was that I wasn't releasing memory which eventually became the bottleneck.
> Sent from my iPhone
>> On Oct 11, 2016, at 9:39 PM, Josh Elser<>  wrote:
>> For only 4GB of data, you don't need to do bulk ingest. That is serious overkill.
>> I don't know why the master would have died/become unresponsive. It is minimally
involved with the write-pipeline.
>> Can you share your current Have you followed the
Accumulo user manual to change the configuration to match the available resources you have
on your 3 nodes where Accumulo is running?
>> Yamini Joshi wrote:
>>> Hello
>>> I am trying to import data from a bson file to a 3 node Accumulo cluster
>>> using pyaccumulo. The bson file is 4G and has a lot of records, all to
>>> be stored into one table. I tried a very naive approach and used
>>> pyaccumulo batch writer to write to the table. After parsing some
>>> records, my master became unresonsive and shut down with the tserver
>>> threads stuck on low memory error. I am assuming that the records are
>>> created faster than what the proxy/master can handle. Is there ant other
>>> way to go about it? I am thinking of using bulk ingest but I am not sure
>>> how exactly.
>>> Best regards,
>>> Yamini Joshi

View raw message