accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Keith Turner <>
Subject [DISCUSS] HDFS operation to support Accumulo locality
Date Tue, 30 Jun 2015 15:00:03 GMT
There was a discussion on IRC about balancing and locality yesterday. I was
thinking about the locallity problem, and started thinking about the
possibility of having a HDFS operation that would force a file to have
local replicas. I think approach this has the following pros over forcing a

  * Only one replica is copied across the network.
  * Avoids decompressing, deserializing, serializing, and compressing data.

The tricky part about this approach is that Accumulo needs to decide when
to ask HDFS to make a file local. This decision could be based on a
function of the file size and number of recent accesses.

We could avoid decompressing, deserializing, etc today by just copying (not
compacting) frequently accessed files. However this would write 3 replicas
where a HDFS operation would only write one.

Note for the assertion that only one replica would need to be copied I was
thinking of following 3 initial conditions.  I am assuming we want to avoid
all three replicas on same rack.

 * Zero replicas on rack : can copy replica to node and drop replica on
another rack.
 * One replica on rack : can copy replica to node and drop any other
 * Two replicas on rack : can copy replica to node and drop another replica
on same rack.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message