accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Fuchs <scubafu...@gmail.com>
Subject Re: Column size limit
Date Mon, 18 Aug 2014 17:05:44 GMT
Joe,

I would say that a rule of thumb would be tens of megabytes for a single
cell. There are two limits that affect this:

1) Amount of memory used: This includes ingesting into the batchwriter,
buffering in the in-memory maps, scanning RFiles, and preparing query
responses. At any given point, there could be a few copies of the cell
hanging out in memory, so you don't want to pack things too tightly. If you
have ridiculous amounts of memory then you can squeeze in some pretty large
docs.
2) Message size for client/server communication: This is limited to 1G by
default, but can be increased if needed. A single key/value pair will not
be fragmented across these message frames.

Whether to store bigger files in fragmented cells or as references to HDFS
files typically has to do with security and lifecycle management. If you
want cell-level security and encryption protection, you'll probably want to
go with a fragmented key/value approach. If you want to keep all of your
data in one spot for easier management you might also prefer to fragment
the files in Accumulo. Otherwise sticking it in HDFS and storing a
reference is a pretty simple and good solution.

Billie did a project a while ago to fragment and store larger files in
Accumulo. I'm not sure what happened with that, but it might be out there
somewhere for you to use.

Cheers,
Adam



On Mon, Aug 18, 2014 at 11:36 AM, Joe Stein <joe.stein@stealth.ly> wrote:

> Hi, for Accumulo is there a recommended max for column value size? So if
> want to store files at what point do we have to split the file into parts
> or (rather) just store it in HDFS with a reference path to it?
>
> /*******************************************
>  Joe Stein
>  Founder, Principal Consultant
>  Big Data Open Source Security LLC
>  http://www.stealth.ly
>  Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
> ********************************************/
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message