accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <josh.el...@gmail.com>
Subject Re: Bulk import
Date Wed, 12 Oct 2016 03:14:53 GMT
Even 10G is a rather small amount of data. Setting up a bulk loading 
framework is a bit more complicated than it appears at first glance. 
Take your pick of course, but I probably wouldn't consider bulk loading 
unless you were regularly processing 10-100x that amount of data :)

yamini.1691@gmail.com wrote:
> The bulk import seemed to be a good option since the bson file generated about 10g data.
The problem with my code was that I wasn't releasing memory which eventually became the bottleneck.
>
> Sent from my iPhone
>
>> On Oct 11, 2016, at 9:39 PM, Josh Elser<josh.elser@gmail.com>  wrote:
>>
>> For only 4GB of data, you don't need to do bulk ingest. That is serious overkill.
>>
>> I don't know why the master would have died/become unresponsive. It is minimally
involved with the write-pipeline.
>>
>> Can you share your current accumulo-env.sh/accumulo-site.xml? Have you followed the
Accumulo user manual to change the configuration to match the available resources you have
on your 3 nodes where Accumulo is running?
>>
>> http://accumulo.apache.org/1.7/accumulo_user_manual.html#_pre_splitting_new_tables
>>
>> http://accumulo.apache.org/1.7/accumulo_user_manual.html#_native_map
>>
>> http://accumulo.apache.org/1.7/accumulo_user_manual.html#_troubleshooting
>>
>> Yamini Joshi wrote:
>>> Hello
>>>
>>> I am trying to import data from a bson file to a 3 node Accumulo cluster
>>> using pyaccumulo. The bson file is 4G and has a lot of records, all to
>>> be stored into one table. I tried a very naive approach and used
>>> pyaccumulo batch writer to write to the table. After parsing some
>>> records, my master became unresonsive and shut down with the tserver
>>> threads stuck on low memory error. I am assuming that the records are
>>> created faster than what the proxy/master can handle. Is there ant other
>>> way to go about it? I am thinking of using bulk ingest but I am not sure
>>> how exactly.
>>>
>>> Best regards,
>>> Yamini Joshi

Mime
View raw message