accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ariel Valentin <>
Subject Re: Stream fed accumulo
Date Thu, 10 Apr 2014 11:21:27 GMT
I agree that loading 10 GB files into memory during file uploads is inefficient but I am not
sure that storing 10GB files in an Accumulo cell is the best approach. 

I would encourage you to perhaps store that file directly in HDFS and if you need to store
the metadata about that file in Accumulo (e.g. mime type, file name, date created). 

Sent from my mobile device. Please excuse any errors.

> On Apr 10, 2014, at 7:10 AM, pdread <> wrote:
> Hi
> This has been bothering me for some time, and I suspect its a dumb question,
> but what the heck.
> The accumulo client API only accepts byte[] or Text as its Mutation input.
> Would it be possible to 
> use a Stream instead (devlopers?)? If I'm processing streams, which I am,
> and I have to handle files to the tune
> of 10GB, which I would like to store in Accumulo but I have read I cannot,
> it would save memory 
> footprint on my tomcats if I could stream my data into accumulo and not deal
> with bytes/text.
> Oh and accumulo developers while you're at adding this new feature it would
> be nice if the bulk loads could
> append instead of just replace the tables....would be nice.
> Thanks
> Paul 
> --
> View this message in context:
> Sent from the Users mailing list archive at

View raw message