Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 22E5C11FAF for ; Mon, 13 May 2013 17:54:40 +0000 (UTC) Received: (qmail 19836 invoked by uid 500); 13 May 2013 12:23:18 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 19738 invoked by uid 500); 13 May 2013 12:23:18 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 19724 invoked by uid 99); 13 May 2013 12:23:18 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 13 May 2013 12:23:18 +0000 X-ASF-Spam-Status: No, hits=2.5 required=5.0 tests=FREEMAIL_REPLY,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of anoop.hbase@gmail.com designates 209.85.128.47 as permitted sender) Received: from [209.85.128.47] (HELO mail-qe0-f47.google.com) (209.85.128.47) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 13 May 2013 12:23:14 +0000 Received: by mail-qe0-f47.google.com with SMTP id w7so3510088qeb.6 for ; Mon, 13 May 2013 05:22:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type; bh=quHYTci/jRVhbCHLgojsZBSGIaCK+t0X1Nh48vslStM=; b=l9NVDCE7rk67Q4Fchu9xyyXLrwFO4v8H96NtRJgzyVuzT79ejVkGjonuJMW4cRz3C4 O4xPU8UvnkFZYG67j8OqasY4pLabyNCGPgcWXk7KTP+/Fgea738DN3Fhlq52pysitXdu BzTPXWpVBwo7arYaKx7esF2RM0Rq0x0ZbwEjfx6sdo5mM0K+tlaXEH4NezztQbZ8yF4Z qw134XmElWQNtocOcOLfC2WhprUdCqxwhEhMsPbWRVklr3f2SUbJ0ow8eAJd0QBy4nbY sf7x6fsS7xj7eTjbNxeCsG+dA6vvlnzmanQjj9DiB9bRIIbNeB45AcgLC1jwPsKJVfIq C57Q== MIME-Version: 1.0 X-Received: by 10.229.195.2 with SMTP id ea2mr2767688qcb.120.1368447773363; Mon, 13 May 2013 05:22:53 -0700 (PDT) Received: by 10.49.83.200 with HTTP; Mon, 13 May 2013 05:22:53 -0700 (PDT) In-Reply-To: References: <75DDD114-0837-4B59-AF49-FDE72E3BD190@gmail.com> Date: Mon, 13 May 2013 17:52:53 +0530 Message-ID: Subject: Re: Block size of HBase files From: Anoop John To: user@hbase.apache.org Content-Type: multipart/alternative; boundary=089e010d92646cd48104dc9893bd X-Virus-Checked: Checked by ClamAV on apache.org --089e010d92646cd48104dc9893bd Content-Type: text/plain; charset=ISO-8859-1 I mean when u created the table (Using client I guess) have u specified any thuing like splitKeys or [start,end, no#regions]? -Anoop- On Mon, May 13, 2013 at 5:49 PM, Praveen Bysani wrote: > We insert data using java hbase client (org.apache.hadoop.hbase.client.*) . > However we are not providing any details in the configuration object , > except for the zookeeper quorum, port number. Should we specify explicitly > at this stage ? > > On 13 May 2013 19:54, Anoop John wrote: > > > >now have 731 regions (each about ~350 mb !!). I checked the > > configuration in CM, and the value for hbase.hregion.max.filesize is 1 > GB > > too !!! > > > > You mentioned the splits at the time of table creation? How u created > the > > table? > > > > -Anoop- > > > > On Mon, May 13, 2013 at 5:18 PM, Praveen Bysani > >wrote: > > > > > Hi, > > > > > > Thanks for the details. No i haven't run any compaction or i have no > idea > > > if there is one going on in background. I executed a major_compact on > > that > > > table and i now have 731 regions (each about ~350 mb !!). I checked > the > > > configuration in CM, and the value for hbase.hregion.max.filesize is 1 > > GB > > > too !!! > > > > > > I am not trying to access HFiles in my MR job, infact i am just using a > > PIG > > > script which handles this. This number (731) is close to my number of > map > > > tasks, which makes sense. But how can i decrease this, shouldn't the > size > > > of each region be 1 GB with that configuration value ? > > > > > > > > > On 13 May 2013 18:36, Ted Yu wrote: > > > > > > > You can change HFile size through hbase.hregion.max.filesize > parameter. > > > > > > > > On May 13, 2013, at 2:45 AM, Praveen Bysani > > > > > wrote: > > > > > > > > > Hi, > > > > > > > > > > I wanted to minimize on the number of map reduce tasks generated > > while > > > > > processing a job, hence configured it to a larger value. > > > > > > > > > > I don't think i have configured HFile size in the cluster. I use > > > Cloudera > > > > > Manager to mange my cluster, and the only configuration i can > relate > > > > > to is hfile.block.cache.size > > > > > which is set to 0.25. How do i change the HFile size ? > > > > > > > > > > On 13 May 2013 15:03, Amandeep Khurana wrote: > > > > > > > > > >> On Sun, May 12, 2013 at 11:40 PM, Praveen Bysani < > > > > praveen.iiith@gmail.com > > > > >>> wrote: > > > > >> > > > > >>> Hi, > > > > >>> > > > > >>> I have the dfs.block.size value set to 1 GB in my cluster > > > > configuration. > > > > >> > > > > >> > > > > >> Just out of curiosity - why do you have it set at 1GB? > > > > >> > > > > >> > > > > >>> I > > > > >>> have around 250 GB of data stored in hbase over this cluster. But > > > when > > > > i > > > > >>> check the number of blocks, it doesn't correspond to the block > size > > > > >> value i > > > > >>> set. From what i understand i should only have ~250 blocks. But > > > instead > > > > >>> when i did a fsck on the /hbase/, i got the following > > > > >>> > > > > >>> Status: HEALTHY > > > > >>> Total size: 265727504820 B > > > > >>> Total dirs: 1682 > > > > >>> Total files: 1459 > > > > >>> Total blocks (validated): 1459 (avg. block size 182129886 B) > > > > >>> Minimally replicated blocks: 1459 (100.0 %) > > > > >>> Over-replicated blocks: 0 (0.0 %) > > > > >>> Under-replicated blocks: 0 (0.0 %) > > > > >>> Mis-replicated blocks: 0 (0.0 %) > > > > >>> Default replication factor: 3 > > > > >>> Average block replication: 3.0 > > > > >>> Corrupt blocks: 0 > > > > >>> Missing replicas: 0 (0.0 %) > > > > >>> Number of data-nodes: 5 > > > > >>> Number of racks: 1 > > > > >>> > > > > >>> Are there any other configuration parameters that need to be set > ? > > > > >> > > > > >> > > > > >> What is your HFile size set to? The HFiles that get persisted > would > > be > > > > >> bound by that number. Thereafter each HFile would be split into > > > blocks, > > > > the > > > > >> size of which you configure using the dfs.block.size configuration > > > > >> parameter. > > > > >> > > > > >> > > > > >>> > > > > >>> -- > > > > >>> Regards, > > > > >>> Praveen Bysani > > > > >>> http://www.praveenbysani.com > > > > > > > > > > > > > > > > > > > > -- > > > > > Regards, > > > > > Praveen Bysani > > > > > http://www.praveenbysani.com > > > > > > > > > > > > > > > > -- > > > Regards, > > > Praveen Bysani > > > http://www.praveenbysani.com > > > > > > > > > -- > Regards, > Praveen Bysani > http://www.praveenbysani.com > --089e010d92646cd48104dc9893bd--