From issues-return-338846-archive-asf-public=cust-asf.ponee.io@hbase.apache.org Sun Mar 18 02:37:06 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 4F9FF180652 for ; Sun, 18 Mar 2018 02:37:06 +0100 (CET) Received: (qmail 33190 invoked by uid 500); 18 Mar 2018 01:37:05 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 33129 invoked by uid 99); 18 Mar 2018 01:37:05 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 18 Mar 2018 01:37:05 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 765121A09C0 for ; Sun, 18 Mar 2018 01:37:04 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -101.511 X-Spam-Level: X-Spam-Status: No, score=-101.511 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id H_0eJ-Tap7e0 for ; Sun, 18 Mar 2018 01:37:03 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 4B18A5F189 for ; Sun, 18 Mar 2018 01:37:02 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 515FFE00C6 for ; Sun, 18 Mar 2018 01:37:00 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 9CA1B21291 for ; Sun, 18 Mar 2018 01:37:00 +0000 (UTC) Date: Sun, 18 Mar 2018 01:37:00 +0000 (UTC) From: "Saad Mufti (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HBASE-20218) Proposed Performance Enhancements For TableSnapshotInputFomat MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-20218?page=3Dcom.atlassi= an.jira.plugin.system.issuetabpanels:all-tabpanel ] Saad Mufti updated HBASE-20218: ------------------------------- Summary: Proposed Performance Enhancements For TableSnapshotInputFomat = (was: Proposed Perfromace Enhancements For TableSnapshotInputFomat) > Proposed Performance Enhancements For TableSnapshotInputFomat > ------------------------------------------------------------- > > Key: HBASE-20218 > URL: https://issues.apache.org/jira/browse/HBASE-20218 > Project: HBase > Issue Type: Bug > Components: mapreduce > Affects Versions: 1.4.0 > Environment: HBase 1.4.0 running in AWS EMR 5.12.0 with the HBase= rootdir set to a folder in S3 > =C2=A0 > Reporter: Saad Mufti > Priority: Minor > > I have been testing a few Spark jobs we have at my company which work off= of TableSnapshotInputFormat to read directly from the filesystem snapshots= created on another EMR/Hbase cluster and stored in S3. During performance = testing I found various small changes which would greatly enhance peformanc= e. Right now we are running our jobs linked with a patched version of HBase= 1.4.0 in which I made these changes, and I am hoping to submit my patch fo= r review and eventual acceptance into the main codebase. > =C2=A0 > The list of changes are : > =C2=A0 > 1. a flag to control whether the snapshot restore uses a UUID based rando= m temp dir in the specified restore directory. We use the flag to turn this= off so that we can benefit from=C2=A0a AWS S3 specific bucket partitioning= scheme we have=C2=A0provisioned. The way S3 partitioning works, you have t= o give a fixed path prefix and=C2=A0a pattern of files after that, and=C2= =A0AWS can then partition on the paths after the fixed prefix into differen= t resources to get more parallelization. We were advised by AWS that we cou= ld only get this good partitioning behavior if we didn't have that rancom d= irectory in there. > =C2=A0 > 2. a flag to turn off the=C2=A0 code that tries to compute locality infor= mation for the splits. This is useless when dealing with S3 since the files= are not on the cluster so there is no use in computing locality; and worse= yet, it uses a single thread in the driver to iterate over all the files i= n the restored snapshot. For=C2=A0a very large table this was taking hours = and hours iterating through S3 objects just to list them (about 360,000 of = them for the our specific table). > =C2=A0 > 3. a flag to override the column family schema setting to=C2=A0prefetch= =C2=A0regions on open. This was causing the main executor thread on which a= Spark task was running, which was trying to read through HFile's for its s= can, compete for a lock on the underlying EMRFS stream object with=C2=A0pre= fetch=C2=A0threads trying to read the same file, so most tasks in the Spark= stage would finish but the last few would linger half an hour or more comp= eting with the=C2=A0prefetch=C2=A0threads alternately for a lock on an EMRF= S stream object. This is the only change that had to be outside the mapredu= ce package as it directly affects the prefetch behavior in CacheConfig.java > =C2=A0 > 4. a flag to turn off maintenance of Scan metrics. this was also causing = a major slowdown, getting rid of this sped things up 4-5 times. What I obse= rved in the thread dumps was that every call to update scan metrics was try= ing to get some HBase counter object and deep underneath was trying to acce= ss some Java resource bundle, and throwing an exception that it wasn't foun= d. The exception was never visible at the application level and was swallow= ed underneath but whatever it was doing was causing a major slowdown. So we= use this flag to avoid collecting those metrics because we never used them > =C2=A0 > I am polishing my patch a bit more and hopefully will attach it tomorrow.= One caveat, I tried but struggled with how to write any useful unit/compon= ent tests for these as these are invisible behaviors that do not affect the= final result at all. And I am not that familiar with the HBase testing sta= ndards, so for now I am looking for guidance on what to tests.=C2=A0 > =C2=A0 > Would appreciate any feedback plus guidance on writing tests, provided of= course there is interest in incorporating my patch into the main codebase. > =C2=A0 > Cheers. > =C2=A0 > ----Saad > =C2=A0 -- This message was sent by Atlassian JIRA (v7.6.3#76005)