hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Praveen Bysani <praveen.ii...@gmail.com>
Subject Re: Block size of HBase files
Date Tue, 14 May 2013 02:23:54 GMT
Hi Anoop,

No we didn't specify any such while creating and writing into the table.

On 13 May 2013 20:22, Anoop John <anoop.hbase@gmail.com> wrote:

> I mean when u created the table (Using client I guess)  have u specified
> any thuing like splitKeys or [start,end, no#regions]?
>
> -Anoop-
>
> On Mon, May 13, 2013 at 5:49 PM, Praveen Bysani <praveen.iiith@gmail.com
> >wrote:
>
> > We insert data using java hbase client
> (org.apache.hadoop.hbase.client.*) .
> > However we are not providing any details in the configuration object ,
> > except for the zookeeper quorum, port number. Should we specify
> explicitly
> > at this stage ?
> >
> > On 13 May 2013 19:54, Anoop John <anoop.hbase@gmail.com> wrote:
> >
> > > >now have 731 regions (each about ~350 mb !!). I checked the
> > > configuration in CM, and the value for hbase.hregion.max.filesize  is 1
> > GB
> > > too !!!
> > >
> > > You mentioned the splits at the time of table creation?  How u created
> > the
> > > table?
> > >
> > > -Anoop-
> > >
> > > On Mon, May 13, 2013 at 5:18 PM, Praveen Bysani <
> praveen.iiith@gmail.com
> > > >wrote:
> > >
> > > > Hi,
> > > >
> > > > Thanks for the details. No i haven't run any compaction or i have no
> > idea
> > > > if there is one going on in background. I executed a major_compact on
> > > that
> > > > table  and i now have 731 regions (each about ~350 mb !!). I checked
> > the
> > > > configuration in CM, and the value for hbase.hregion.max.filesize
>  is 1
> > > GB
> > > > too !!!
> > > >
> > > > I am not trying to access HFiles in my MR job, infact i am just
> using a
> > > PIG
> > > > script which handles this. This number (731) is close to my number of
> > map
> > > > tasks, which makes sense. But how can i decrease this, shouldn't the
> > size
> > > > of each region be 1 GB with that configuration value ?
> > > >
> > > >
> > > > On 13 May 2013 18:36, Ted Yu <yuzhihong@gmail.com> wrote:
> > > >
> > > > > You can change HFile size through hbase.hregion.max.filesize
> > parameter.
> > > > >
> > > > > On May 13, 2013, at 2:45 AM, Praveen Bysani <
> praveen.iiith@gmail.com
> > >
> > > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I wanted to minimize on the number of map reduce tasks generated
> > > while
> > > > > > processing a job, hence configured it to a larger value.
> > > > > >
> > > > > > I don't think i have configured HFile size in the cluster. I
use
> > > > Cloudera
> > > > > > Manager to mange my cluster, and the only configuration i can
> > relate
> > > > > > to is hfile.block.cache.size
> > > > > > which is set to 0.25. How do i change the HFile size ?
> > > > > >
> > > > > > On 13 May 2013 15:03, Amandeep Khurana <amansk@gmail.com>
wrote:
> > > > > >
> > > > > >> On Sun, May 12, 2013 at 11:40 PM, Praveen Bysani <
> > > > > praveen.iiith@gmail.com
> > > > > >>> wrote:
> > > > > >>
> > > > > >>> Hi,
> > > > > >>>
> > > > > >>> I have the dfs.block.size value set to 1 GB in my cluster
> > > > > configuration.
> > > > > >>
> > > > > >>
> > > > > >> Just out of curiosity - why do you have it set at 1GB?
> > > > > >>
> > > > > >>
> > > > > >>> I
> > > > > >>> have around 250 GB of data stored in hbase over this
cluster.
> But
> > > > when
> > > > > i
> > > > > >>> check the number of blocks, it doesn't correspond to
the block
> > size
> > > > > >> value i
> > > > > >>> set. From what i understand i should only have ~250
blocks. But
> > > > instead
> > > > > >>> when i did a fsck on the /hbase/<table-name>,
i got the
> following
> > > > > >>>
> > > > > >>> Status: HEALTHY
> > > > > >>> Total size:    265727504820 B
> > > > > >>> Total dirs:    1682
> > > > > >>> Total files:   1459
> > > > > >>> Total blocks (validated):      1459 (avg. block size
182129886
> B)
> > > > > >>> Minimally replicated blocks:   1459 (100.0 %)
> > > > > >>> Over-replicated blocks:        0 (0.0 %)
> > > > > >>> Under-replicated blocks:       0 (0.0 %)
> > > > > >>> Mis-replicated blocks:         0 (0.0 %)
> > > > > >>> Default replication factor:    3
> > > > > >>> Average block replication:     3.0
> > > > > >>> Corrupt blocks:                0
> > > > > >>> Missing replicas:              0 (0.0 %)
> > > > > >>> Number of data-nodes:          5
> > > > > >>> Number of racks:               1
> > > > > >>>
> > > > > >>> Are there any other configuration parameters that need
to be
> set
> > ?
> > > > > >>
> > > > > >>
> > > > > >> What is your HFile size set to? The HFiles that get persisted
> > would
> > > be
> > > > > >> bound by that number. Thereafter each HFile would be split
into
> > > > blocks,
> > > > > the
> > > > > >> size of which you configure using the dfs.block.size
> configuration
> > > > > >> parameter.
> > > > > >>
> > > > > >>
> > > > > >>>
> > > > > >>> --
> > > > > >>> Regards,
> > > > > >>> Praveen Bysani
> > > > > >>> http://www.praveenbysani.com
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Regards,
> > > > > > Praveen Bysani
> > > > > > http://www.praveenbysani.com
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Regards,
> > > > Praveen Bysani
> > > > http://www.praveenbysani.com
> > > >
> > >
> >
> >
> >
> > --
> > Regards,
> > Praveen Bysani
> > http://www.praveenbysani.com
> >
>



-- 
Regards,
Praveen Bysani
http://www.praveenbysani.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message