accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arshak Navruzyan <arsh...@gmail.com>
Subject Re: deployment architecture
Date Tue, 07 Jan 2014 06:30:21 GMT
Josh,

Thanks.  This is helpful.  One additional question.  Are these drops that I
am seeing in the number of ingest entries/s inevitable as compaction kicks
in?  Or is this is a side effect of my tiny 2 node test environment (in
other words if I had hundreds of tservers the compaction activity of a
handful of nodes wouldn't impact the overall ingest rate so severely).

This btw if the result of the batchwriter creating 50 byte random entries.

Arshak

[image: Inline image 1]


 On 1/5/14, 12:44 PM, Arshak Navruzyan wrote:

> Is there a document that describes best practices for Accumulo deployments?
>

I'm guessing the Accumulo user-manual[1] covers some of this, but I'm not
positive.

 In particular:
>
> 1.  Should you run Accumulo on HD data nodes and name nodes? (Is
> enabling HDFS short-circuit local reads a good idea?)
>

Datanodes and tasktrackers/nodemanagers, yes. I wouldn't run it on the
Namenode though.

 2.  If so do you disable map/reduce for nodes that run Accumulo tservers?
>

With conscious awareness of your resource allocation (make sure there are
still physical resources for Accumulo) this should be fine, but be careful
if you're running a heavy M/R load.

 3.  Is auto-splitting (by size) done in the real world or do most real
> apps have pre-set split points?
>

Adding some split points is probably always a good idea. Making sure each
tabletserver has at least a few tablets for your table is good, after that,
you can increase the size of the split threshold (default is 1GB) for that
table so you get a good distribution of tablets/tservers for the amount of
data you're storing (100-200 tablets is a good target). The splits
themselves obviously depend on your data, though.

 4.  Do you let Accumulo decide when to flush and compact or do people
> write these into their apps (based on their knowledge of app behavior)
>

Unless you have retention policies which are stringent upon data being
physically removed from disk (as opposed to not visible through Accumulo's
API), I'm not coming up with a reason that you would have to automate
flush/compact. If you're doing data age-off (e.g. keeping N months of data,
and rolling off the oldest day of data each data), it's probably not a bad
idea to just do a range compaction on that old day to clean it up before
your users are hitting your system full swing.

 I know the generic answer is "it all depends on your app/workload" but
> if anyone wants to still describe their environment it would be helpful.
>
> Thanks.
>

[1] http://accumulo.apache.org/1.5/accumulo_user_manual.html

Mime
View raw message