accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <josh.el...@gmail.com>
Subject Re: virtualize accumulo?
Date Tue, 05 Nov 2013 20:31:30 GMT
Hi Kesten,

As you likely know (given your arguments against), using virtualization 
to a Hadoop stack can introduce some unintended consequences. Hadoop has 
a lot of heartbeats between processes to determine system "aliveness". 
If your infrastructure is overloaded, Hadoop can really suffer from 
spikes in latency.

Accumulo is much the same way, arguably a bit more. Accumulo's processes 
are very dependent on maintaining a lock in ZooKeeper (every 30 seconds 
by default) instead of RPC calls between DataNodes and NameNodes. 
Accumulo's node failure tends to be much more expensive than HDFS' 
because Accumulo wants to make sure every tablet is available without 
significant downtime. Hadoop has multiple replicas for each file so it 
can be a bit more lazy about noticing failure and re-replicating. What 
I've typically heard is that running Accumulo in a virtualized 
environment makes administration and use a bit more difficult.

If you're considering running HDFS on baremetal, I would encourage you 
do to the same with Accumulo or investigate something like YARN (really, 
HOYA https://github.com/hortonworks/hoya/) to do dynamic provisioning. 
Accumulo has the ability to happily scale and run across many nodes, so 
you shouldn't have to worry about large installation problems (in other 
words: one Accumulo instance should be sufficient for a cluster). 
YARN/HOYA gives you the dynamic allocations on top of your cluster to 
have the ease of spinning up and down Accumulo clusters as you want/need 
them.

On 11/5/13, 3:21 PM, Kesten Broughton wrote:
> I've seen arguments both for and against virtualizing hadoop/hdfs.
> (the arguments for were from vmware :)
>
> We are considering hdfs on baremetal, with accumulo being virtualized.
> This would serve a fairly constant amount of data but widely varying compute demands.
> Has anyone tried this?  Can anyone share their experience with baremetal/virtualization
with accumulo?
>
> thanks
>
> kesten
> (first post)
>

Mime
View raw message