accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Medinets <>
Subject Is Data Locality Helpful? (or why run tserver and datanode on the same box?)
Date Thu, 19 Jun 2014 14:07:28 GMT
At the Accumulo Summit and on a recent client site, there have been
conversations about Data Locality and Accumulo.

I ran an experiment to see that Accumulo can scan tables when the
tserver process is run on a server without a datanode process. I
followed these steps:

1. Start three node cluster
2. Load data
3. Kill datanode on slave1
4. Wait until Hadoop notices dead node.
5. Kill tserver on slave2
6. Wait until Accumulo notices dead node.
7. Run the accumulo shell on master and slave1 to verify entries can be scanned.

Accumulo handled this situation just fine. As I expected.

How important (or not) is it to run tserver and datanode on the same server?
Does the Data Locality implied by running them together exist?
Can the benefit be quantified?

View raw message