accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arshak Navruzyan <arsh...@gmail.com>
Subject Re: deployment architecture
Date Tue, 07 Jan 2014 18:28:17 GMT
BTW, I also found a good description of a "typical cluster" in the upcoming
O'Reilly Accumulo book: http://shop.oreilly.com/product/0636920032304.do




On Sun, Jan 5, 2014 at 10:08 AM, Josh Elser <josh.elser@gmail.com> wrote:

> On 1/5/14, 12:44 PM, Arshak Navruzyan wrote:
>
>> Is there a document that describes best practices for Accumulo
>> deployments?
>>
>
> I'm guessing the Accumulo user-manual[1] covers some of this, but I'm not
> positive.
>
>
>  In particular:
>>
>> 1.  Should you run Accumulo on HD data nodes and name nodes? (Is
>> enabling HDFS short-circuit local reads a good idea?)
>>
>
> Datanodes and tasktrackers/nodemanagers, yes. I wouldn't run it on the
> Namenode though.
>
>
>  2.  If so do you disable map/reduce for nodes that run Accumulo tservers?
>>
>
> With conscious awareness of your resource allocation (make sure there are
> still physical resources for Accumulo) this should be fine, but be careful
> if you're running a heavy M/R load.
>
>
>  3.  Is auto-splitting (by size) done in the real world or do most real
>> apps have pre-set split points?
>>
>
> Adding some split points is probably always a good idea. Making sure each
> tabletserver has at least a few tablets for your table is good, after that,
> you can increase the size of the split threshold (default is 1GB) for that
> table so you get a good distribution of tablets/tservers for the amount of
> data you're storing (100-200 tablets is a good target). The splits
> themselves obviously depend on your data, though.
>
>
>  4.  Do you let Accumulo decide when to flush and compact or do people
>> write these into their apps (based on their knowledge of app behavior)
>>
>
> Unless you have retention policies which are stringent upon data being
> physically removed from disk (as opposed to not visible through Accumulo's
> API), I'm not coming up with a reason that you would have to automate
> flush/compact. If you're doing data age-off (e.g. keeping N months of data,
> and rolling off the oldest day of data each data), it's probably not a bad
> idea to just do a range compaction on that old day to clean it up before
> your users are hitting your system full swing.
>
>
>  I know the generic answer is "it all depends on your app/workload" but
>> if anyone wants to still describe their environment it would be helpful.
>>
>> Thanks.
>>
>
> [1] http://accumulo.apache.org/1.5/accumulo_user_manual.html
>

Mime
View raw message