Return-Path: Delivered-To: apmail-hadoop-hbase-user-archive@minotaur.apache.org Received: (qmail 48841 invoked from network); 2 Sep 2009 12:31:08 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 2 Sep 2009 12:31:08 -0000 Received: (qmail 50703 invoked by uid 500); 2 Sep 2009 12:31:08 -0000 Delivered-To: apmail-hadoop-hbase-user-archive@hadoop.apache.org Received: (qmail 50677 invoked by uid 500); 2 Sep 2009 12:31:08 -0000 Mailing-List: contact hbase-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hbase-user@hadoop.apache.org Delivered-To: mailing list hbase-user@hadoop.apache.org Received: (qmail 50667 invoked by uid 99); 2 Sep 2009 12:31:08 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 02 Sep 2009 12:31:08 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of xinejar22@googlemail.com designates 74.125.78.27 as permitted sender) Received: from [74.125.78.27] (HELO ey-out-2122.google.com) (74.125.78.27) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 02 Sep 2009 12:30:56 +0000 Received: by ey-out-2122.google.com with SMTP id 22so160735eye.23 for ; Wed, 02 Sep 2009 05:30:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type; bh=belXYwjztDks09NZozy8LvMTc4XqjGfNEZhQ3T+ybCE=; b=K0m5aOmFw/VSuVeo/KZpWxhY1J5oIavU1iybcG2UIpE9RQQpmgZCGAHgTcSWwrmykN 5/z+jCQcw1yV8KreG2mWlT6fvRZWulhFExVCimeTrMTPKkMeiIe9czldRk5HVKmpVNpm t4vJZ9xHGEJqh1RwbeJzVKSVe3Kf0rn3COpt4= DomainKey-Signature: a=rsa-sha1; c=nofws; d=googlemail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=VQaOKVVwmILNqSjDIgiy+gsDwJrNlFw6idWLxeL3ILcKmZskYNy+NfWQcHC2dHzIda 8sW2MkgdiFMl5AP0/ib8B5SB8zaOX09f4zzgresW1DaG61qTT7bjI7RGxuxnNzy7W6jE 5f8ds012JLNz/Zrg5YZAyTLs9JdlwgHhu353w= MIME-Version: 1.0 Received: by 10.211.196.16 with SMTP id y16mr7862126ebp.11.1251894635854; Wed, 02 Sep 2009 05:30:35 -0700 (PDT) In-Reply-To: <31a243e70909020328w43f963l5466926d62acc4bf@mail.gmail.com> References: <115373b00909010753i2bd76eeav1d8adf71fcf65529@mail.gmail.com> <31a243e70909010800p5a5dd371nec793c04e511bba1@mail.gmail.com> <115373b00909011005k3ad363e9ude536d1301e6b7cf@mail.gmail.com> <31a243e70909011013l75062a74lb9b2cc9c344fbc67@mail.gmail.com> <115373b00909020251hd317c1an9495a06369078084@mail.gmail.com> <31a243e70909020328w43f963l5466926d62acc4bf@mail.gmail.com> Date: Wed, 2 Sep 2009 14:30:31 +0200 Message-ID: <115373b00909020530x4ecdcea1mb9e5bf5ae5798cbe@mail.gmail.com> Subject: Re: number of tablets and hbase table size From: Xine Jar To: hbase-user@hadoop.apache.org, xinejar22@googlemail.com Content-Type: multipart/alternative; boundary=0015174c0eea11ef7d0472976f4a X-Virus-Checked: Checked by ClamAV on apache.org --0015174c0eea11ef7d0472976f4a Content-Type: text/plain; charset=ISO-8859-1 :) Since I am not seeing neither the ROOT nor the METADATA I am obviously on the wrong path. I thought it should be seen in the DFS where a mapreduce program takes its input file from and stores its output file. and the default for me is: *pc150:~/Desktop/hbase-0.19.3 # /root/Desktop/hadoop-0.19.1/bin/hadoop dfs -ls Found 2 items drwxr-xr-x - root supergroup 0 2009-08-31 22:21 /user/root/input drwxr-xr-x - root supergroup 0 2009-09-02 16:02 /user/root/output *If there is another path could you please tell me where is it configured? So that I can check it?!!! Thank you On Wed, Sep 2, 2009 at 12:28 PM, Jean-Daniel Cryans wrote: > Same drill. > > J-D > > On Wed, Sep 2, 2009 at 5:51 AM, Xine Jar wrote: > > Hallo, > > The theoretical concept of the table is clear for me. I am aware that the > > writes are kept in memory in a buffer called memtable and whenever this > > buffer reaches a threshold, the memtable is automatically flushed to the > > disk. > > > > Now I have tried to flush the table by executing the following: > > > > *hbase(main):001:0> flush 'myTable' > > 0 row(s) in 0.2019 seconds > > > > hbase(main):002:0> describe 'myTable' > > {NAME => 'myTable', FAMILIES => [{NAME => 'cf', COMPRESSION => 'NONE', > > VERSIONS => '3', LENGTH => '2147483647' > > , TTL => '-1', IN_MEMORY => 'false', BLOCKCACHE => 'false'}]} > > * > > Q1-the expression "0 row(s) in 0.2019" means that it did not flush > > anything?!! > > Nah it's just the way we count the rows we show in the shell. In this > case we did not increment some counter so it shows "0 row", so it's a > UI bug. BTW describing your table won't tell you how many rows you > have or how many are still kept in the memtable. > > > > > Q2- IN_MEMORY=FALSE means that the table is not in memory? so is it in > the > > disk?!!! If it is so, I still cannot see it in the DFS when executing > > "bin/hadoop dfs -ls". > > This is a family-scope property that tell HBase to keep it always in > RAM (but also on disk, it's not ephemeral). In your case, that means > that HBase shouldn't do anything in particular for that family. > > Are you sure you are doing a ls at the right place in the filesystem? > Do you see the META and ROOT folder? Is there any data in your table? > You can do a "count" in the shell to make sure. > > > > > > > Thank you for taking look at that > > > > Regards, > > CJ > > > > On Tue, Sep 1, 2009 at 7:13 PM, Jean-Daniel Cryans >wrote: > > > >> Inline. > >> > >> J-D > >> > >> On Tue, Sep 1, 2009 at 1:05 PM, Xine Jar > wrote: > >> > Thank you, > >> > > >> > while the answers on Q3 and Q4 were clear enough I still have some > >> problems > >> > with the first two questions. > >> > >> Good > >> > >> > > >> > -which entry in the hbase-default.xml allows me to check the size of a > >> > tablet? > >> > >> Those are configuration parameters, not commands. A region will split > >> when a family gets that size. See > >> http://wiki.apache.org/hadoop/Hbase/HbaseArchitecture#hregion for more > >> info on splitting. > >> > >> > > >> > -In hadoop, I used to copy a file to the DFS by doing "bin/hadoop dfs > >> > -copyFromLocal filesource fileDFS". > >> > Having this file in the DFS I could list it "bin/hadoop dfs -ls" and > >> check > >> > its size by doing "bin/hadoop dfs -du fileDFS" > >> > But when I create an hbase table, this table does not appear in the > DFS. > >> > Therefore the latter command gives an error it cannot find > >> > the table!! So how can I point to the folder of the table? > >> > >> Just make sure the table is flushed to disk, the writes are kept in > >> memory as described in the link I pasted for the previous question. > >> You can force that by going in the shell and issuing "flush 'table'" > >> where table replaced with the name of your table. > >> > >> > > >> > Regards, > >> > CJ > >> > > >> > > >> > On Tue, Sep 1, 2009 at 5:00 PM, Jean-Daniel Cryans < > jdcryans@apache.org > >> >wrote: > >> > > >> >> Anwers inline. > >> >> > >> >> J-D > >> >> > >> >> On Tue, Sep 1, 2009 at 10:53 AM, Xine Jar > >> wrote: > >> >> > Hallo, > >> >> > I have a cluster of 6 nodes running hadoop0.19.3 and hbase 0.19.1. > I > >> have > >> >> > managed to write small programs to test the settings and everything > >> seems > >> >> to > >> >> > be fine. > >> >> > > >> >> > I wrote a mapreduce program reading a small hbase table (100 rows, > one > >> >> > familiy colum, 6 columns) and summing some values. In my opinion > the > >> job > >> >> is > >> >> > slow, it > >> >> > is taking 19sec. I would like to look closer what is going, if the > >> table > >> >> is > >> >> > plit into tablets or not ...Therefore I appreciate if someone can > >> answer > >> >> my > >> >> > following questions: > >> >> > >> >> With that size, that's expected. You would be better off scanning > your > >> >> table directly instead, MapReduce has a startup cost and 19 seconds > >> >> isn't that much. > >> >> > >> >> > > >> >> > > >> >> > *Q1 -Does the value of "hbase.hregion.max.filesize" in the > >> >> > hbase-default.xml indicate the maximum size of a tablet in bytes? > >> >> > >> >> It's the maximum size of a family (in a region) in bytes. > >> >> > >> >> > > >> >> > Q2- How can I know the size of the hbase table I have created? (I > >> guess > >> >> the > >> >> > "Describe" command from the shell does not provide it) > >> >> > >> >> Size as in disk space? You could use the hadoop dfs -du command on > >> >> your table's folder. > >> >> > >> >> > > >> >> > Q3- Is there a way to know the real number of tablets constituting > my > >> >> table? > >> >> > >> >> In the Master's web UI, click on the name of your table. If you want > >> >> to do that programmatically, you can indirectly do it by calling > >> >> HTable.getEndKeys() and the size of that array is the number of > >> >> regions. > >> >> > >> >> > > >> >> > Q4- Is there a way to get more information on the tablets handeled > by > >> >> each > >> >> > regionserver? (their number, the rows constituting each tablet) * > >> >> > >> >> In the Master's web UI, click on the region server you want info for. > >> >> Getting the number of rows inside a region, for the moment, can't be > >> >> done directly (requires doing a scan between the start and end keys > of > >> >> a region and counting the number of rows you see). > >> >> > >> >> > > >> >> > Thank you for you help, > >> >> > CJ > >> >> > > >> >> > >> > > >> > > > --0015174c0eea11ef7d0472976f4a--