hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Daniel Cryans <jdcry...@apache.org>
Subject Re: number of tablets and hbase table size
Date Wed, 02 Sep 2009 10:28:23 GMT
Same drill.

J-D

On Wed, Sep 2, 2009 at 5:51 AM, Xine Jar<xinejar22@googlemail.com> wrote:
> Hallo,
> The theoretical concept of the table is clear for me. I am aware that the
> writes are kept in memory in a buffer called memtable and whenever this
> buffer reaches a threshold, the memtable is automatically flushed to the
> disk.
>
> Now I have tried to flush the table by executing the following:
>
> *hbase(main):001:0> flush 'myTable'
> 0 row(s) in 0.2019 seconds
>
> hbase(main):002:0> describe 'myTable'
> {NAME => 'myTable', FAMILIES => [{NAME => 'cf', COMPRESSION => 'NONE',
> VERSIONS => '3', LENGTH => '2147483647'
> , TTL => '-1', IN_MEMORY => 'false', BLOCKCACHE => 'false'}]}
> *
> Q1-the expression "0 row(s) in 0.2019" means that it did not flush
> anything?!!

Nah it's just the way we count the rows we show in the shell. In this
case we did not increment some counter so it shows "0 row", so it's a
UI bug. BTW describing your table won't tell you how many rows you
have or how many are still kept in the memtable.

>
> Q2- IN_MEMORY=FALSE means that the table is not in memory? so is it in the
> disk?!!! If it is so, I still cannot see it in the DFS when executing
> "bin/hadoop dfs -ls".

This is a family-scope property that tell HBase to keep it always in
RAM (but also on disk, it's not ephemeral). In your case, that means
that HBase shouldn't do anything in particular for that family.

Are you sure you are doing a ls at the right place in the filesystem?
Do you see the META and ROOT folder? Is there any data in your table?
You can do a "count" in the shell to make sure.

>
>
> Thank you for taking look at that
>
> Regards,
> CJ
>
> On Tue, Sep 1, 2009 at 7:13 PM, Jean-Daniel Cryans <jdcryans@apache.org>wrote:
>
>> Inline.
>>
>> J-D
>>
>> On Tue, Sep 1, 2009 at 1:05 PM, Xine Jar<xinejar22@googlemail.com> wrote:
>> > Thank you,
>> >
>> >  while the answers on Q3 and Q4 were clear enough I still have some
>> problems
>> > with the first two questions.
>>
>> Good
>>
>> >
>> > -which entry in the hbase-default.xml allows me to check the size of a
>> > tablet?
>>
>> Those are configuration parameters, not commands. A region will split
>> when a family gets that size. See
>> http://wiki.apache.org/hadoop/Hbase/HbaseArchitecture#hregion for more
>> info on splitting.
>>
>> >
>> > -In hadoop, I used to copy a file to the DFS by doing "bin/hadoop dfs
>> > -copyFromLocal filesource fileDFS".
>> >  Having this file in the DFS I could list it "bin/hadoop dfs -ls" and
>> check
>> > its size by doing "bin/hadoop dfs -du fileDFS"
>> >  But when I create an hbase table, this table does not appear in the DFS.
>> > Therefore the latter command gives an error it cannot find
>> >  the table!!  So how can I point to the folder of the table?
>>
>> Just make sure the table is flushed to disk, the writes are kept in
>> memory as described in the link I pasted for the previous question.
>> You can force that by going in the shell and issuing "flush 'table'"
>> where table replaced with the name of your table.
>>
>> >
>> > Regards,
>> > CJ
>> >
>> >
>> > On Tue, Sep 1, 2009 at 5:00 PM, Jean-Daniel Cryans <jdcryans@apache.org
>> >wrote:
>> >
>> >> Anwers inline.
>> >>
>> >> J-D
>> >>
>> >> On Tue, Sep 1, 2009 at 10:53 AM, Xine Jar<xinejar22@googlemail.com>
>> wrote:
>> >> > Hallo,
>> >> > I have a cluster of 6 nodes running hadoop0.19.3 and hbase 0.19.1.
I
>> have
>> >> > managed to write small programs to test the settings and everything
>> seems
>> >> to
>> >> > be fine.
>> >> >
>> >> > I wrote a mapreduce program reading a small hbase table (100 rows,
one
>> >> > familiy colum, 6 columns) and summing some values. In my opinion the
>> job
>> >> is
>> >> > slow, it
>> >> > is taking 19sec. I would like to look closer what is going, if the
>> table
>> >> is
>> >> > plit into tablets or not ...Therefore I appreciate if someone can
>> answer
>> >> my
>> >> > following questions:
>> >>
>> >> With that size, that's expected. You would be better off scanning your
>> >> table directly instead, MapReduce has a startup cost and 19 seconds
>> >> isn't that much.
>> >>
>> >> >
>> >> >
>> >> > *Q1 -Does  the value of "hbase.hregion.max.filesize" in the
>> >> > hbase-default.xml indicate the maximum size of a tablet in bytes?
>> >>
>> >> It's the maximum size of a family (in a region) in bytes.
>> >>
>> >> >
>> >> > Q2- How can I know the size of the hbase table I have created? (I
>> guess
>> >> the
>> >> > "Describe" command from the shell does not provide it)
>> >>
>> >> Size as in disk space? You could use the hadoop dfs -du command on
>> >> your table's folder.
>> >>
>> >> >
>> >> > Q3- Is there a way to know the real number of tablets constituting
my
>> >> table?
>> >>
>> >> In the Master's web UI, click on the name of your table. If you want
>> >> to do that programmatically, you can indirectly do it by calling
>> >> HTable.getEndKeys() and the size of that array is the number of
>> >> regions.
>> >>
>> >> >
>> >> > Q4- Is there a way to get more information on the tablets handeled
by
>> >> each
>> >> > regionserver? (their number, the rows constituting each tablet)  
*
>> >>
>> >> In the Master's web UI, click on the region server you want info for.
>> >> Getting the number of rows inside a region, for the moment, can't be
>> >> done directly (requires doing a scan between the start and end keys of
>> >> a region and counting the number of rows you see).
>> >>
>> >> >
>> >> > Thank you for you help,
>> >> > CJ
>> >> >
>> >>
>> >
>>
>

Mime
View raw message