accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joe Stein <joe.st...@stealth.ly>
Subject Re: Column size limit
Date Mon, 18 Aug 2014 17:17:04 GMT
Thanks! &&  Thanks!


On Mon, Aug 18, 2014 at 1:14 PM, Josh Elser <josh.elser@gmail.com> wrote:

> I think Billie's project is one of our "examples" --
> http://accumulo.apache.org/1.6/examples/dirlist.html
>
>
> On 8/18/14, 1:05 PM, Adam Fuchs wrote:
>
>> Joe,
>>
>> I would say that a rule of thumb would be tens of megabytes for a single
>> cell. There are two limits that affect this:
>>
>> 1) Amount of memory used: This includes ingesting into the batchwriter,
>> buffering in the in-memory maps, scanning RFiles, and preparing query
>> responses. At any given point, there could be a few copies of the cell
>> hanging out in memory, so you don't want to pack things too tightly. If
>> you
>> have ridiculous amounts of memory then you can squeeze in some pretty
>> large
>> docs.
>> 2) Message size for client/server communication: This is limited to 1G by
>> default, but can be increased if needed. A single key/value pair will not
>> be fragmented across these message frames.
>>
>> Whether to store bigger files in fragmented cells or as references to HDFS
>> files typically has to do with security and lifecycle management. If you
>> want cell-level security and encryption protection, you'll probably want
>> to
>> go with a fragmented key/value approach. If you want to keep all of your
>> data in one spot for easier management you might also prefer to fragment
>> the files in Accumulo. Otherwise sticking it in HDFS and storing a
>> reference is a pretty simple and good solution.
>>
>> Billie did a project a while ago to fragment and store larger files in
>> Accumulo. I'm not sure what happened with that, but it might be out there
>> somewhere for you to use.
>>
>> Cheers,
>> Adam
>>
>>
>>
>> On Mon, Aug 18, 2014 at 11:36 AM, Joe Stein <joe.stein@stealth.ly> wrote:
>>
>>  Hi, for Accumulo is there a recommended max for column value size? So if
>>> want to store files at what point do we have to split the file into parts
>>> or (rather) just store it in HDFS with a reference path to it?
>>>
>>> /*******************************************
>>>   Joe Stein
>>>   Founder, Principal Consultant
>>>   Big Data Open Source Security LLC
>>>   http://www.stealth.ly
>>>   Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
>>> ********************************************/
>>>
>>>
>>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message