Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 77ED07DA4 for ; Mon, 1 Aug 2011 16:37:30 +0000 (UTC) Received: (qmail 93369 invoked by uid 500); 1 Aug 2011 16:37:28 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 93202 invoked by uid 500); 1 Aug 2011 16:37:28 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 92928 invoked by uid 99); 1 Aug 2011 16:37:27 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 01 Aug 2011 16:37:27 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_NEUTRAL,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [209.85.213.41] (HELO mail-yw0-f41.google.com) (209.85.213.41) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 01 Aug 2011 16:37:18 +0000 Received: by ywa6 with SMTP id 6so1064892ywa.14 for ; Mon, 01 Aug 2011 09:36:57 -0700 (PDT) Received: by 10.68.31.199 with SMTP id c7mr1725700pbi.231.1312216616571; Mon, 01 Aug 2011 09:36:56 -0700 (PDT) Received: from [192.168.144.105] (c-24-7-42-85.hsd1.ca.comcast.net [24.7.42.85]) by mx.google.com with ESMTPS id v1sm5706937pbg.47.2011.08.01.09.36.54 (version=TLSv1/SSLv3 cipher=OTHER); Mon, 01 Aug 2011 09:36:55 -0700 (PDT) Sender: Christopher Tarnas Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Apple Message framework v1084) Subject: Re: Row-key design (was: GZ better than LZO?) From: Chris Tarnas In-Reply-To: <84B5E4309B3B9F4ABFF7664C3CD7698302D0DCCB@kairo.scch.at> Date: Mon, 1 Aug 2011 09:36:53 -0700 Content-Transfer-Encoding: quoted-printable Message-Id: <3B11DE60-6D3E-40BF-98B3-FC5B49F20313@email.com> References: <84B5E4309B3B9F4ABFF7664C3CD7698302D0DCCB@kairo.scch.at> To: user@hbase.apache.org X-Mailer: Apple Mail (2.1084) X-Virus-Checked: Checked by ClamAV on apache.org Glad to be able to help. You don't need fixed width fields to do range = scans, you just need delimiters that are lexicographically less than any = valid character in your fields. If your fields are are printable = non-whitespace characters then tab, ASCII 0x09, works very well. That = will guarantee correct overall sorting. For example if vehicle_id is = your first field and device_id is your second field:=20 1\t1 1\t2 10\t1 10\t2 2\t1 2\t2 . . You can then do prefix scans on for a particular vehicle_id, just be = sure to append the delimiter to the vehicle_id and using that as your = prefix. I have also used the null character (ASCII 0x00) as a delimiter as well = as combined the null delimiters and fixed width binary fields for more = complex index type keys. -chris On Jul 31, 2011, at 11:48 PM, Steinmaurer Thomas wrote: > Hello Chris, >=20 > thanks a lot for your insights. Much appreciated! >=20 > In our test there were 1000 differents vehicle_ids, inserted via our > multi-threaded client, so while sequential integers (basically the > iterator value of the for loop starting the threads) not strictly > inserted in that order. >=20 > Regarding padding. I thought that we need some sort of fixed width = stuff > per "element" (what's the correct term here) in the row key, to enable > the possibility to do range scans. Our growing factor in the system is = a > growing number of vehicles which needs to be supported. While we have > the vehicle_id at the beginning of the rowkey, you mean, moving the > vehicle_id value to the left will give better distribution? We still > need to have range scans though. >=20 > In real life, the vehicle in the master data might not be uniquely > identified by an integer, but an alphanumeric serial number, so I = guess > this will make a difference then and should be included in our tests > compared to sequential integers as part of the row key. >=20 > Still. For range scans, I thought we need some sort of fixed width row > keys, thus padding the row key data with "0". >=20 > Thanks! >=20 > Thomas >=20 > -----Original Message----- > From: Christopher Tarnas [mailto:cft@tarnas.org] On Behalf Of Chris > Tarnas > Sent: Freitag, 29. Juli 2011 18:49 > To: user@hbase.apache.org > Subject: Re: GZ better than LZO? >=20 > Your region distribution across the nodes is not great, for both cases > most of your data is going to one server, spreading the regions out > across multiple servers would be best. >=20 > How many different vehicle_ids are being used, and are they all > sequential integers in your tests? Hbase performs better when not = doing > sequential inserts. You could try reversing the vehicle ids to get > around that (see the many discussions on the list about using reverse > timestamps as a rowkey) >=20 > Looking at your key construction I would suggest, unless your app > requires it, to not left-pad your ids with zeros and rather use a > delimiter between the key components. That will lead to smaller keys, = if > you use a tab as your delimiter that character falls before all other > alphanumeric and punctuation characters (other than LF, CR, etc - > characters that should not be in your IDs) so the keys will sort the > same and left padded ones.=20 >=20 > I've had good luck with converting sequential numeric IDs to base 64 = and > then reversing them - that leads to very good key distribution across > regions and shorter keys for any given number. Another option - if you > don't care if your rowkeys are plaintext, is to convert the IDs to > binary numbers and then reverse the bytes - that would be the most > compact. If you do that you would go back to not using delimiters and > just have fixed offsets for each component. >=20 > Once you have a rowkey design you can then go ahead and create your > tables pre-split with multiple empty regions. That should perform much > better over all for inserts, especially when the DB is new and empty = to > start. >=20 > How did the load with 4 million records perform? >=20 > -chris >=20 > On Jul 29, 2011, at 12:36 AM, Steinmaurer Thomas wrote: >=20 >> Hi Chris! >>=20 >> Your questions are somehow hard to answer for me, because I'm not=20 >> really in charge for the test cluster from an administration/setup > POV. >>=20 >> Basically, when running: >> http://xxx:60010/master.jsp >>=20 >> I see 7 region servers. Each with a "maxHeap" value of 995. >>=20 >> When clicking on the different tables depending on the compression=20 >> type, I get the following information: >>=20 >> GZ compressed table: 3 regions hosted by one region server LZO=20 >> compressed table: 8 regions hosted by two region servers, where the=20= >> start region is hosted by one region server and all other 7 regions=20= >> are hosted on the second region server >>=20 >> Regarding the insert pattern etc... please have a look on my reply to=20= >> Chiku, where I describe the test data generator and the table layout=20= >> etc ... a bit. >>=20 >> Thanks, >> Thomas >>=20 >> -----Original Message----- >> From: Christopher Tarnas [mailto:cft@tarnas.org] On Behalf Of Chris=20= >> Tarnas >> Sent: Donnerstag, 28. Juli 2011 19:43 >> To: user@hbase.apache.org >> Subject: Re: GZ better than LZO? >>=20 >> During the load did you add enough data to do a flush or compaction?=20= >> P, In our cluster that amount of data inserted would not necessarily=20= >> be enough to actually flush store files. Performance really depends = on >=20 >> how the table's regions are laid out, the insert pattern, the number=20= >> of regionservers and the amount of RAM allocated to each = regionserver. >=20 >> If you don't see any flushes or compactions in the log try repeating=20= >> that test and then flushing the table and do a compaction (or add = more >=20 >> data so it happens automatically) and timing everything. It would be=20= >> interesting to see if the GZ benefit holds up. >>=20 >> -chris >>=20 >> On Jul 28, 2011, at 6:31 AM, Steinmaurer Thomas wrote: >>=20 >>> Hello, >>>=20 >>>=20 >>>=20 >>> we ran a test client generating data into GZ and LZO compressed > table. >>> Equal data sets (number of rows: 1008000 and the same table schema).=20= >>> ~ >>> 7.78 GB disk space uncompressed in HDFS. LZO is ~ 887 MB whereas GZ=20= >>> is >>=20 >>> ~ >>> 444 MB, so basically half of LZO. >>>=20 >>>=20 >>>=20 >>> Execution time of the data generating client was 1373 seconds into=20= >>> the >>=20 >>> uncompressed table, 3374 sec. into LZO and 2198 sec. into GZ. The=20 >>> data >>=20 >>> generation client is based on HTablePool and using batch operations. >>>=20 >>>=20 >>>=20 >>> So in our (simple) test, GZ beats LZO in both, disk usage and=20 >>> execution time of the client. We haven't tried reads yet. >>>=20 >>>=20 >>>=20 >>> Is this an expected result? I thought LZO is the recommended=20 >>> compression algorithm? Or does LZO outperforms GZ with a growing=20 >>> amount of data or in read scenarios? >>>=20 >>>=20 >>>=20 >>> Regards, >>>=20 >>> Thomas >>>=20 >>>=20 >>>=20 >>=20 >=20