Mailing-List: contact hbase-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hbase-user@hadoop.apache.org
Received-SPF: pass (nike.apache.org: domain of xinejar22@googlemail.com
 designates 74.125.78.27 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=googlemail.com; s=gamma;
        h=mime-version:in-reply-to:references:date:message-id:subject:from:to
         :content-type;
        b=VQaOKVVwmILNqSjDIgiy+gsDwJrNlFw6idWLxeL3ILcKmZskYNy+NfWQcHC2dHzIda
         8sW2MkgdiFMl5AP0/ib8B5SB8zaOX09f4zzgresW1DaG61qTT7bjI7RGxuxnNzy7W6jE
         5f8ds012JLNz/Zrg5YZAyTLs9JdlwgHhu353w=
MIME-Version: 1.0
In-Reply-To: <31a243e70909020328w43f963l5466926d62acc4bf@mail.gmail.com>
References: <115373b00909010753i2bd76eeav1d8adf71fcf65529@mail.gmail.com>
	 <31a243e70909010800p5a5dd371nec793c04e511bba1@mail.gmail.com>
	 <115373b00909011005k3ad363e9ude536d1301e6b7cf@mail.gmail.com>
	 <31a243e70909011013l75062a74lb9b2cc9c344fbc67@mail.gmail.com>
	 <115373b00909020251hd317c1an9495a06369078084@mail.gmail.com>
	 <31a243e70909020328w43f963l5466926d62acc4bf@mail.gmail.com>
Date: Wed, 2 Sep 2009 14:30:31 +0200
Message-ID: <115373b00909020530x4ecdcea1mb9e5bf5ae5798cbe@mail.gmail.com>
Subject: Re: number of tablets and hbase table size
From: Xine Jar <xinejar22@googlemail.com>
To: hbase-user@hadoop.apache.org, xinejar22@googlemail.com
Content-Type: multipart/alternative; boundary=0015174c0eea11ef7d0472976f4a

--0015174c0eea11ef7d0472976f4a
Content-Type: text/plain; charset=ISO-8859-1

:)

Since I am not seeing neither the ROOT nor the METADATA I am obviously on
the wrong path. I thought it should be seen in the DFS where a mapreduce
program takes its input file from and stores its output file. and the
default for me is:

*pc150:~/Desktop/hbase-0.19.3 # /root/Desktop/hadoop-0.19.1/bin/hadoop dfs
-ls
Found 2 items
drwxr-xr-x   - root supergroup          0 2009-08-31 22:21 /user/root/input
drwxr-xr-x   - root supergroup          0 2009-09-02 16:02 /user/root/output

*If there is another path could you please tell me where is it configured?
So that I can check it?!!!

Thank you


On Wed, Sep 2, 2009 at 12:28 PM, Jean-Daniel Cryans <jdcryans@apache.org>wrote:

> Same drill.
>
> J-D
>
> On Wed, Sep 2, 2009 at 5:51 AM, Xine Jar<xinejar22@googlemail.com> wrote:
> > Hallo,
> > The theoretical concept of the table is clear for me. I am aware that the
> > writes are kept in memory in a buffer called memtable and whenever this
> > buffer reaches a threshold, the memtable is automatically flushed to the
> > disk.
> >
> > Now I have tried to flush the table by executing the following:
> >
> > *hbase(main):001:0> flush 'myTable'
> > 0 row(s) in 0.2019 seconds
> >
> > hbase(main):002:0> describe 'myTable'
> > {NAME => 'myTable', FAMILIES => [{NAME => 'cf', COMPRESSION => 'NONE',
> > VERSIONS => '3', LENGTH => '2147483647'
> > , TTL => '-1', IN_MEMORY => 'false', BLOCKCACHE => 'false'}]}
> > *
> > Q1-the expression "0 row(s) in 0.2019" means that it did not flush
> > anything?!!
>
> Nah it's just the way we count the rows we show in the shell. In this
> case we did not increment some counter so it shows "0 row", so it's a
> UI bug. BTW describing your table won't tell you how many rows you
> have or how many are still kept in the memtable.
>
> >
> > Q2- IN_MEMORY=FALSE means that the table is not in memory? so is it in
> the
> > disk?!!! If it is so, I still cannot see it in the DFS when executing
> > "bin/hadoop dfs -ls".
>
> This is a family-scope property that tell HBase to keep it always in
> RAM (but also on disk, it's not ephemeral). In your case, that means
> that HBase shouldn't do anything in particular for that family.
>
> Are you sure you are doing a ls at the right place in the filesystem?
> Do you see the META and ROOT folder? Is there any data in your table?
> You can do a "count" in the shell to make sure.
>
> >
> >
> > Thank you for taking look at that
> >
> > Regards,
> > CJ
> >
> > On Tue, Sep 1, 2009 at 7:13 PM, Jean-Daniel Cryans <jdcryans@apache.org
> >wrote:
> >
> >> Inline.
> >>
> >> J-D
> >>
> >> On Tue, Sep 1, 2009 at 1:05 PM, Xine Jar<xinejar22@googlemail.com>
> wrote:
> >> > Thank you,
> >> >
> >> >  while the answers on Q3 and Q4 were clear enough I still have some
> >> problems
> >> > with the first two questions.
> >>
> >> Good
> >>
> >> >
> >> > -which entry in the hbase-default.xml allows me to check the size of a
> >> > tablet?
> >>
> >> Those are configuration parameters, not commands. A region will split
> >> when a family gets that size. See
> >> http://wiki.apache.org/hadoop/Hbase/HbaseArchitecture#hregion for more
> >> info on splitting.
> >>
> >> >
> >> > -In hadoop, I used to copy a file to the DFS by doing "bin/hadoop dfs
> >> > -copyFromLocal filesource fileDFS".
> >> >  Having this file in the DFS I could list it "bin/hadoop dfs -ls" and
> >> check
> >> > its size by doing "bin/hadoop dfs -du fileDFS"
> >> >  But when I create an hbase table, this table does not appear in the
> DFS.
> >> > Therefore the latter command gives an error it cannot find
> >> >  the table!!  So how can I point to the folder of the table?
> >>
> >> Just make sure the table is flushed to disk, the writes are kept in
> >> memory as described in the link I pasted for the previous question.
> >> You can force that by going in the shell and issuing "flush 'table'"
> >> where table replaced with the name of your table.
> >>
> >> >
> >> > Regards,
> >> > CJ
> >> >
> >> >
> >> > On Tue, Sep 1, 2009 at 5:00 PM, Jean-Daniel Cryans <
> jdcryans@apache.org
> >> >wrote:
> >> >
> >> >> Anwers inline.
> >> >>
> >> >> J-D
> >> >>
> >> >> On Tue, Sep 1, 2009 at 10:53 AM, Xine Jar<xinejar22@googlemail.com>
> >> wrote:
> >> >> > Hallo,
> >> >> > I have a cluster of 6 nodes running hadoop0.19.3 and hbase 0.19.1.
> I
> >> have
> >> >> > managed to write small programs to test the settings and everything
> >> seems
> >> >> to
> >> >> > be fine.
> >> >> >
> >> >> > I wrote a mapreduce program reading a small hbase table (100 rows,
> one
> >> >> > familiy colum, 6 columns) and summing some values. In my opinion
> the
> >> job
> >> >> is
> >> >> > slow, it
> >> >> > is taking 19sec. I would like to look closer what is going, if the
> >> table
> >> >> is
> >> >> > plit into tablets or not ...Therefore I appreciate if someone can
> >> answer
> >> >> my
> >> >> > following questions:
> >> >>
> >> >> With that size, that's expected. You would be better off scanning
> your
> >> >> table directly instead, MapReduce has a startup cost and 19 seconds
> >> >> isn't that much.
> >> >>
> >> >> >
> >> >> >
> >> >> > *Q1 -Does  the value of "hbase.hregion.max.filesize" in the
> >> >> > hbase-default.xml indicate the maximum size of a tablet in bytes?
> >> >>
> >> >> It's the maximum size of a family (in a region) in bytes.
> >> >>
> >> >> >
> >> >> > Q2- How can I know the size of the hbase table I have created? (I
> >> guess
> >> >> the
> >> >> > "Describe" command from the shell does not provide it)
> >> >>
> >> >> Size as in disk space? You could use the hadoop dfs -du command on
> >> >> your table's folder.
> >> >>
> >> >> >
> >> >> > Q3- Is there a way to know the real number of tablets constituting
> my
> >> >> table?
> >> >>
> >> >> In the Master's web UI, click on the name of your table. If you want
> >> >> to do that programmatically, you can indirectly do it by calling
> >> >> HTable.getEndKeys() and the size of that array is the number of
> >> >> regions.
> >> >>
> >> >> >
> >> >> > Q4- Is there a way to get more information on the tablets handeled
> by
> >> >> each
> >> >> > regionserver? (their number, the rows constituting each tablet)   *
> >> >>
> >> >> In the Master's web UI, click on the region server you want info for.
> >> >> Getting the number of rows inside a region, for the moment, can't be
> >> >> done directly (requires doing a scan between the start and end keys
> of
> >> >> a region and counting the number of rows you see).
> >> >>
> >> >> >
> >> >> > Thank you for you help,
> >> >> > CJ
> >> >> >
> >> >>
> >> >
> >>
> >
>

--0015174c0eea11ef7d0472976f4a--