accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <>
Subject Re: Column size limit
Date Mon, 18 Aug 2014 17:14:23 GMT
I think Billie's project is one of our "examples" --

On 8/18/14, 1:05 PM, Adam Fuchs wrote:
> Joe,
> I would say that a rule of thumb would be tens of megabytes for a single
> cell. There are two limits that affect this:
> 1) Amount of memory used: This includes ingesting into the batchwriter,
> buffering in the in-memory maps, scanning RFiles, and preparing query
> responses. At any given point, there could be a few copies of the cell
> hanging out in memory, so you don't want to pack things too tightly. If you
> have ridiculous amounts of memory then you can squeeze in some pretty large
> docs.
> 2) Message size for client/server communication: This is limited to 1G by
> default, but can be increased if needed. A single key/value pair will not
> be fragmented across these message frames.
> Whether to store bigger files in fragmented cells or as references to HDFS
> files typically has to do with security and lifecycle management. If you
> want cell-level security and encryption protection, you'll probably want to
> go with a fragmented key/value approach. If you want to keep all of your
> data in one spot for easier management you might also prefer to fragment
> the files in Accumulo. Otherwise sticking it in HDFS and storing a
> reference is a pretty simple and good solution.
> Billie did a project a while ago to fragment and store larger files in
> Accumulo. I'm not sure what happened with that, but it might be out there
> somewhere for you to use.
> Cheers,
> Adam
> On Mon, Aug 18, 2014 at 11:36 AM, Joe Stein <> wrote:
>> Hi, for Accumulo is there a recommended max for column value size? So if
>> want to store files at what point do we have to split the file into parts
>> or (rather) just store it in HDFS with a reference path to it?
>> /*******************************************
>>   Joe Stein
>>   Founder, Principal Consultant
>>   Big Data Open Source Security LLC
>>   Twitter: @allthingshadoop <>
>> ********************************************/

View raw message