accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yamini Joshi <yamini.1...@gmail.com>
Subject Re: Bulk import
Date Wed, 12 Oct 2016 03:29:29 GMT
Alright. I'll keep that in mind. The next step for me will be to import
data from 90G Bson files. I think that'll be a good start for bulk import.

Best regards,
Yamini Joshi

On Tue, Oct 11, 2016 at 10:14 PM, Josh Elser <josh.elser@gmail.com> wrote:

> Even 10G is a rather small amount of data. Setting up a bulk loading
> framework is a bit more complicated than it appears at first glance. Take
> your pick of course, but I probably wouldn't consider bulk loading unless
> you were regularly processing 10-100x that amount of data :)
>
>
> yamini.1691@gmail.com wrote:
>
>> The bulk import seemed to be a good option since the bson file generated
>> about 10g data. The problem with my code was that I wasn't releasing memory
>> which eventually became the bottleneck.
>>
>> Sent from my iPhone
>>
>> On Oct 11, 2016, at 9:39 PM, Josh Elser<josh.elser@gmail.com>  wrote:
>>>
>>> For only 4GB of data, you don't need to do bulk ingest. That is serious
>>> overkill.
>>>
>>> I don't know why the master would have died/become unresponsive. It is
>>> minimally involved with the write-pipeline.
>>>
>>> Can you share your current accumulo-env.sh/accumulo-site.xml? Have you
>>> followed the Accumulo user manual to change the configuration to match the
>>> available resources you have on your 3 nodes where Accumulo is running?
>>>
>>> http://accumulo.apache.org/1.7/accumulo_user_manual.html#_pr
>>> e_splitting_new_tables
>>>
>>> http://accumulo.apache.org/1.7/accumulo_user_manual.html#_native_map
>>>
>>> http://accumulo.apache.org/1.7/accumulo_user_manual.html#_tr
>>> oubleshooting
>>>
>>> Yamini Joshi wrote:
>>>
>>>> Hello
>>>>
>>>> I am trying to import data from a bson file to a 3 node Accumulo cluster
>>>> using pyaccumulo. The bson file is 4G and has a lot of records, all to
>>>> be stored into one table. I tried a very naive approach and used
>>>> pyaccumulo batch writer to write to the table. After parsing some
>>>> records, my master became unresonsive and shut down with the tserver
>>>> threads stuck on low memory error. I am assuming that the records are
>>>> created faster than what the proxy/master can handle. Is there ant other
>>>> way to go about it? I am thinking of using bulk ingest but I am not sure
>>>> how exactly.
>>>>
>>>> Best regards,
>>>> Yamini Joshi
>>>>
>>>

Mime
View raw message