accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Frank Smith <francis.h.sm...@outlook.com>
Subject RE: Best practices in sizing values?
Date Mon, 10 Jun 2013 00:21:29 GMT
So, what are your thoughts on storing a bunch of small files on the HDFS?  Sequence Files,
Avro?
I will note that these are essentially write once and read heavy chunks of text.

> Date: Sun, 9 Jun 2013 17:08:42 -0400
> Subject: Re: Best practices in sizing values?
> From: ctubbsii@apache.org
> To: user@accumulo.apache.org
> 
> At the very least, I would keep it under the size of your compressed
> data blocks in your RFiles (this may mean you should increase value of
> table.file.compress.blocksize to be larger than the default of 100K).
> 
> You could also tweak this according to your application. Say, for
> example, you wanted to limit the additional work to resolve the
> pointer and retrieve from HDFS only 5% of the time, you could sample
> your data, and choose a cutoff value that keeps 95% of your data in
> the Accumulo table.
> 
> Personally, I like to keep things under 1MB in the value, and under 1K
> in the key, as a crude rule of thumb, but it very much depends on the
> application.
> 
> --
> Christopher L Tubbs II
> http://gravatar.com/ctubbsii
> 
> 
> On Sun, Jun 9, 2013 at 4:37 PM, Frank Smith <francis.h.smith@outlook.com> wrote:
> > I have an application where I have a block of unstructured text.  Normally
> > that text is relatively small <500k, but there are conditions where it can
> > be up to GBs of text.
> >
> > I was considering of using a threshold where I simply decide to change from
> > storing the text in the value of my mutation, and just add a reference to
> > the HDFS location, but I wanted to get some advice on where that threshold
> > should (best practice) or must (system limitation) be?
> >
> > Also, can I stream data into a value, vice passing a byte array?  Similar to
> > how CLOBs and BLOBs are handled in an RDBMS.
> >
> > Thanks,
> >
> > Frank
 		 	   		  
Mime
View raw message