accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Keith Turner <ke...@deenlo.com>
Subject Re: [DISCUSS] HDFS operation to support Accumulo locality
Date Tue, 30 Jun 2015 15:07:34 GMT
I just thought of one potential issue with this.  The same file can be
shared by multiple tablets on different tservers.   If there are more than
3 tablets sharing a file, it could cause problems if all of them request a
local replica.  So if hdfs had this operation, Accumulo would have to be
careful about which files it requested local blocks for.

On Tue, Jun 30, 2015 at 11:00 AM, Keith Turner <keith@deenlo.com> wrote:

> There was a discussion on IRC about balancing and locality yesterday. I
> was thinking about the locallity problem, and started thinking about the
> possibility of having a HDFS operation that would force a file to have
> local replicas. I think approach this has the following pros over forcing a
> compaction.
>
>   * Only one replica is copied across the network.
>   * Avoids decompressing, deserializing, serializing, and compressing data.
>
> The tricky part about this approach is that Accumulo needs to decide when
> to ask HDFS to make a file local. This decision could be based on a
> function of the file size and number of recent accesses.
>
> We could avoid decompressing, deserializing, etc today by just copying
> (not compacting) frequently accessed files. However this would write 3
> replicas where a HDFS operation would only write one.
>
> Note for the assertion that only one replica would need to be copied I was
> thinking of following 3 initial conditions.  I am assuming we want to avoid
> all three replicas on same rack.
>
>  * Zero replicas on rack : can copy replica to node and drop replica on
> another rack.
>  * One replica on rack : can copy replica to node and drop any other
> replica.
>  * Two replicas on rack : can copy replica to node and drop another
> replica on same rack.
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message