accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From yamini.1...@gmail.com
Subject Re: Bulk import
Date Wed, 12 Oct 2016 02:57:00 GMT
The bulk import seemed to be a good option since the bson file generated about 10g data. The
problem with my code was that I wasn't releasing memory which eventually became the bottleneck.

Sent from my iPhone

> On Oct 11, 2016, at 9:39 PM, Josh Elser <josh.elser@gmail.com> wrote:
> 
> For only 4GB of data, you don't need to do bulk ingest. That is serious overkill.
> 
> I don't know why the master would have died/become unresponsive. It is minimally involved
with the write-pipeline.
> 
> Can you share your current accumulo-env.sh/accumulo-site.xml? Have you followed the Accumulo
user manual to change the configuration to match the available resources you have on your
3 nodes where Accumulo is running?
> 
> http://accumulo.apache.org/1.7/accumulo_user_manual.html#_pre_splitting_new_tables
> 
> http://accumulo.apache.org/1.7/accumulo_user_manual.html#_native_map
> 
> http://accumulo.apache.org/1.7/accumulo_user_manual.html#_troubleshooting
> 
> Yamini Joshi wrote:
>> Hello
>> 
>> I am trying to import data from a bson file to a 3 node Accumulo cluster
>> using pyaccumulo. The bson file is 4G and has a lot of records, all to
>> be stored into one table. I tried a very naive approach and used
>> pyaccumulo batch writer to write to the table. After parsing some
>> records, my master became unresonsive and shut down with the tserver
>> threads stuck on low memory error. I am assuming that the records are
>> created faster than what the proxy/master can handle. Is there ant other
>> way to go about it? I am thinking of using bulk ingest but I am not sure
>> how exactly.
>> 
>> Best regards,
>> Yamini Joshi

Mime
View raw message