Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 47422FC2B for ; Thu, 21 Mar 2013 16:48:05 +0000 (UTC) Received: (qmail 98720 invoked by uid 500); 21 Mar 2013 16:48:03 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 98677 invoked by uid 500); 21 Mar 2013 16:48:03 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 98668 invoked by uid 99); 21 Mar 2013 16:48:03 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 21 Mar 2013 16:48:03 +0000 X-ASF-Spam-Status: No, hits=-0.1 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of amits@infolinks.com designates 207.126.144.115 as permitted sender) Received: from [207.126.144.115] (HELO eu1sys200aog103.obsmtp.com) (207.126.144.115) by apache.org (qpsmtpd/0.29) with SMTP; Thu, 21 Mar 2013 16:47:56 +0000 Received: from mail-oa0-f71.google.com ([209.85.219.71]) (using TLSv1) by eu1sys200aob103.postini.com ([207.126.147.11]) with SMTP ID DSNKUUs5p4h2ToNWin20k01Km9Jd0xz+joML@postini.com; Thu, 21 Mar 2013 16:47:36 UTC Received: by mail-oa0-f71.google.com with SMTP id o6so16655208oag.10 for ; Thu, 21 Mar 2013 09:47:34 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:mime-version:x-received:in-reply-to:references:date :message-id:subject:from:to:content-type:x-gm-message-state; bh=l7bRXrrcaLzcm2ZuFMji26qGc5PnqV8pXllWCJGytEg=; b=Z75MT1Pjlli7t/KcvXcWKELT9SiAXgJp1NVazPAREg9ujkNljp+veia968aDp09z25 BS2ay1lrBie+pg2gMihtbI3Eh1YXUkmhQ4x5fIqhD8HB7TFEZetK77QNddHd4qb3PVtk UUH6Ns51R7baFX2PF+0Hx5/rJsLT2MqBOd0zmg1C3sukgMOCUdo5oq5cE9MoDd+lSKDM TbPEoR9tilsmCEUgKNs+z4bEvr+VpAi5rw3bvrDn85xTbVxuz800fVMcjG+ARwQus6U6 LccKFHkvAS53V5iqnG7cp/hGU0puSPNQ9lgmJgE/LsR/Y0S8v1NaHnA6ELOVYqYD4A7I KDBA== X-Received: by 10.50.192.138 with SMTP id hg10mr2418174igc.95.1363884454250; Thu, 21 Mar 2013 09:47:34 -0700 (PDT) MIME-Version: 1.0 X-Received: by 10.50.192.138 with SMTP id hg10mr2418166igc.95.1363884454005; Thu, 21 Mar 2013 09:47:34 -0700 (PDT) Received: by 10.64.71.33 with HTTP; Thu, 21 Mar 2013 09:47:33 -0700 (PDT) In-Reply-To: References: Date: Thu, 21 Mar 2013 18:47:33 +0200 Message-ID: Subject: Re: How to prevent major compaction when doing bulk load provisioning? From: Amit Sela To: user@hbase.apache.org Content-Type: multipart/alternative; boundary=14dae9340717656a7c04d8721848 X-Gm-Message-State: ALoCoQmVAltUsl1tD6Uoh6Jrf2eApkTfLRZaILVpfbjJk/fTHAQT57NSvLgiHu8i5rJVzhvFKKDdZvDSVY6mWXOW4zdHm4KW29Jsij4L9+LETHjkT8s6WW2HhWsktBHuYu9Vtnp99VUi24eRTGxEZH7bXlcB4WH+eQ== X-Virus-Checked: Checked by ClamAV on apache.org --14dae9340717656a7c04d8721848 Content-Type: text/plain; charset=ISO-8859-1 Did you try pre-splitting your table before bulk loading ? On Thu, Mar 21, 2013 at 3:29 PM, Nicolas Seyvet wrote: > Hi, > > We are using code similar to > https://github.com/jrkinley/hbase-bulk-import-example/ in order to > benchmark our HBase cluster. We are running a CDH4 installation, and HBase > is version 0.92.1-cdh4.1.1.. The cluster is composed of 12 slaves and 1 > master and 1 secondary master. > > During the bulk load insert, roughly within 3 hours after the start > (~200Gb), we notice a large drop in performance in the insert rate. At the > same time, there is a spike in IO and CPU usage. Connecting to a Region > Server (RS), the Monitored Task section shows that a compaction is started. > > I have set hbase.hregion.max.filesize to 107374182400 (100Gb), and disable > automatic major compaction hbase.hregion.majorcompactionis set to 0. > > What we are doing is that we have 1000 files of synthetic data (csv), where > each row in a file is one row to insert into HBase, each file contains 600K > rows (or 600K events). Our loader works in the following way: > 1. Look for a file > 2. When a file is found, prepare a job for that file > 3. Launch job > 4. Wait for completion > 5. Compute insert rate (nb of rows /time) > 6. Repeat from 1 until there are no more files. > > What I understand of the bulk load M/R job is that it produces one HFile > for each Region. > > Questions: > - How is HStoreFileSize calclulated? > - What do HStoreFileSize, storeFileSize and hbase.hregion.max.filesize have > in common? > - Can the number of HFiles trigger a major compaction? > > Thx for help. I hope my questions make sense. > > /Nicolas > --14dae9340717656a7c04d8721848--