Return-Path: X-Original-To: apmail-accumulo-dev-archive@www.apache.org Delivered-To: apmail-accumulo-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id BC4EB18B3B for ; Tue, 30 Jun 2015 16:10:30 +0000 (UTC) Received: (qmail 93653 invoked by uid 500); 30 Jun 2015 16:10:30 -0000 Delivered-To: apmail-accumulo-dev-archive@accumulo.apache.org Received: (qmail 93612 invoked by uid 500); 30 Jun 2015 16:10:30 -0000 Mailing-List: contact dev-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@accumulo.apache.org Delivered-To: mailing list dev@accumulo.apache.org Received: (qmail 93594 invoked by uid 99); 30 Jun 2015 16:10:30 -0000 Received: from Unknown (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 30 Jun 2015 16:10:30 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id AA683D135D for ; Tue, 30 Jun 2015 16:10:29 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 4.001 X-Spam-Level: **** X-Spam-Status: No, score=4.001 tagged_above=-999 required=6.31 tests=[HTML_MESSAGE=3, KAM_LAZY_DOMAIN_SECURITY=1, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-eu-west.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id 8J4wOsExJsem for ; Tue, 30 Jun 2015 16:10:24 +0000 (UTC) Received: from mail-ie0-f176.google.com (mail-ie0-f176.google.com [209.85.223.176]) by mx1-eu-west.apache.org (ASF Mail Server at mx1-eu-west.apache.org) with ESMTPS id 4AC2220774 for ; Tue, 30 Jun 2015 16:10:23 +0000 (UTC) Received: by iecuq6 with SMTP id uq6so15805674iec.2 for ; Tue, 30 Jun 2015 09:10:22 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=I839ZgrO3H/CSfAJ1Grs2piEBAg38chyPSPRsxjWnEo=; b=gtlYvoVXeRWwcT0KEUewnDGNXll1jFdh0BK+HTI7QEKnwhKi+dsIJXjR11LkRIAyn3 vVIPd6AruxRsa9u/haH7vDEYwCpjQXqhJpCQZDRSFTkhnDPvKiWna62Sbwelq+avSfG7 MFhPEHBN65HvliqjcPMhKsCMcdY3ZsqwW37Z+93xkBm80o1j0oOEnjgS5d/K9I79OOAv uDwMbFZMz4iUxVvQI2Ys1wFxJriVrZQk9l4T83lQJGhk1TM2TvIUHpAm84aLIo/Hzl0K 8ujrGZv2nPMJ1xNnvpYbPzKSQGWrvBpZkfaCcYFF/MtD164rHC2WNpN+7Z4yiFpTc56T uD2g== X-Gm-Message-State: ALoCoQmuv9dqD2ckQXBkLw98NB2FLKEC6WfQeA9dA1zX4iL+ofyCYzh0WdcUQjyHkzNKa3QioRGm MIME-Version: 1.0 X-Received: by 10.50.43.227 with SMTP id z3mr26419134igl.22.1435680622067; Tue, 30 Jun 2015 09:10:22 -0700 (PDT) Received: by 10.36.146.193 with HTTP; Tue, 30 Jun 2015 09:10:22 -0700 (PDT) In-Reply-To: <5592B824.4040509@gmail.com> References: <5592B824.4040509@gmail.com> Date: Tue, 30 Jun 2015 12:10:22 -0400 Message-ID: Subject: Re: [DISCUSS] HDFS operation to support Accumulo locality From: Keith Turner To: Accumulo Dev List Content-Type: multipart/alternative; boundary=089e0111c0167d7b5b0519be71bd --089e0111c0167d7b5b0519be71bd Content-Type: text/plain; charset=UTF-8 On Tue, Jun 30, 2015 at 11:39 AM, Josh Elser wrote: > Sorry in advance if I derail this, but I'm not sure what it would take to > actually implement such an operation. The initial pushback might just be > "use the block locations and assign the tablet yourself", since that's > essentially what HBase does (not suggesting there isn't something better to > do, just a hunch). > > IMO, we don't have a lot of information on locality presently. I was > thinking it would be nice to create a tool to help us understand locality > at all. > > My guess is that after this, our next big gain would be choosing a better > candidate for where to move a tablet in the case of rebalancing, splits and > previous-server failure (pretty much all of the times that we aren't/can't > put the tablet back to its previous loc). I'm not sure how far this would > get us combined with the favored nodes API, e.g. a Tablet has some favored > datanodes which we include in the HDFS calls and we can try to put the > tablet on one of those nodes and assume that HDFS will have the blocks > there. > > tl;dr I'd want to have examples of how that the current API is > insufficient before lobbying for new HDFS APIs. Having some examples of how the status quo is insufficient is a good idea. I was trying to think of situations where there are no suitable nodes that have *all* of a tablets file blocks local. In these situations the best we can hope for is a node that has the largest subset of a tablets file blocks. I think the following scenarios can cause this situation where there is no node that has all tablet file blocks. * Added X new tablet servers. Tablets moved inorder to evenly spread tablets. * A lot of tablets in a table just split. Inorder to evenly spread tablets across cluster, need to move them. * Decommissioned X tablet servers. Tablets moved inorder to evenly spread tablets. * A tablets has been hosted on multiple tablet servers and as a result there is no single datanode that has all of its file blocks. * Tablet servers run on a subset of the datanodes. Is the ratio of tservers to datanodes goes lower the ability to find a datanode with many of a tablets file blocks goes down. * Decommissioning or adding datanodes could also throw off a tablets locality. Are there other cases I am missing? > > > Keith Turner wrote: > >> I just thought of one potential issue with this. The same file can be >> shared by multiple tablets on different tservers. If there are more than >> 3 tablets sharing a file, it could cause problems if all of them request a >> local replica. So if hdfs had this operation, Accumulo would have to be >> careful about which files it requested local blocks for. >> >> On Tue, Jun 30, 2015 at 11:00 AM, Keith Turner wrote: >> >> There was a discussion on IRC about balancing and locality yesterday. I >>> was thinking about the locallity problem, and started thinking about the >>> possibility of having a HDFS operation that would force a file to have >>> local replicas. I think approach this has the following pros over >>> forcing a >>> compaction. >>> >>> * Only one replica is copied across the network. >>> * Avoids decompressing, deserializing, serializing, and compressing >>> data. >>> >>> The tricky part about this approach is that Accumulo needs to decide when >>> to ask HDFS to make a file local. This decision could be based on a >>> function of the file size and number of recent accesses. >>> >>> We could avoid decompressing, deserializing, etc today by just copying >>> (not compacting) frequently accessed files. However this would write 3 >>> replicas where a HDFS operation would only write one. >>> >>> Note for the assertion that only one replica would need to be copied I >>> was >>> thinking of following 3 initial conditions. I am assuming we want to >>> avoid >>> all three replicas on same rack. >>> >>> * Zero replicas on rack : can copy replica to node and drop replica on >>> another rack. >>> * One replica on rack : can copy replica to node and drop any other >>> replica. >>> * Two replicas on rack : can copy replica to node and drop another >>> replica on same rack. >>> >>> >>> >>> >> --089e0111c0167d7b5b0519be71bd--