Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C60D6D7CC for ; Thu, 7 Mar 2013 08:43:04 +0000 (UTC) Received: (qmail 37272 invoked by uid 500); 7 Mar 2013 08:43:02 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 36954 invoked by uid 500); 7 Mar 2013 08:43:02 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 36928 invoked by uid 99); 7 Mar 2013 08:43:01 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 07 Mar 2013 08:43:01 +0000 X-ASF-Spam-Status: No, hits=-2.3 required=5.0 tests=RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of jtaylor@salesforce.com designates 64.18.3.38 as permitted sender) Received: from [64.18.3.38] (HELO exprod8og119.obsmtp.com) (64.18.3.38) by apache.org (qpsmtpd/0.29) with SMTP; Thu, 07 Mar 2013 08:42:56 +0000 Received: from exsfm-hub5.internal.salesforce.com ([204.14.239.233]) by exprod8ob119.postini.com ([64.18.7.12]) with SMTP ID DSNKUThS/NGnrj6TOUmZcCXjIIK03/zkOqFU@postini.com; Thu, 07 Mar 2013 00:42:36 PST Received: from [10.0.54.31] (10.0.54.31) by exsfm-hub5.internal.salesforce.com (10.1.127.5) with Microsoft SMTP Server (TLS) id 8.3.279.5; Thu, 7 Mar 2013 00:42:36 -0800 Message-ID: <513852FB.607@salesforce.com> Date: Thu, 7 Mar 2013 00:42:35 -0800 From: James Taylor User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130221 Thunderbird/17.0.3 MIME-Version: 1.0 To: "user@hbase.apache.org" Subject: Re: Rowkey design and presplit table References: In-Reply-To: Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 8bit X-Virus-Checked: Checked by ClamAV on apache.org Another possible solution for you: use Phoenix: https://github.com/forcedotcom/phoenix Phoenix would allow you to model your scenario using SQL through JDBC, like this: Connection conn = DriverManager.connect("jdbc:phoenix:"); Statement stmt = conn.createStatement( "CREATE TABLE article_discussion (" + " category_id BIGINT not null," + " article_id BIGINT not null," + " comment_id BIGINT not null " CONSTRAINT pk PRIMARY KEY (category_id, article_id, comment_id))"); stmt.execute(); You could, of course, add other non primary key columns to the above. You could also pre-split the table and/or pass through any HBase configuration parameters necessary. Then, you populate your table like this: PrepareStatement stmt = conn.prepareStatement( "UPSERT INTO article_discussion VALUES(?,?,?)"); stmt.setLong("category_id", categoryId); stmt.setLong("article_id", articleId); stmt.setLong("comment_id", commentId); And to query for all comments in a given category, you'd do this: PrepareStatement stmt1 = conn.prepareStatement( "SELECT * FROM article_discussion WHERE category_id = ?"); stmt1.setLong("category_id", categoryId); ResultSet rs = stmt1.executeQuery(); while (rs.next()) { System.out.println( "category_id=" + rs.getLong("category_id") + ",article_id=" + rs.getLong("article_id") + ",comment_id=" + rs.getLong("comment_id")); } and for all comments for a given category and article: PrepareStatement stmt1 = conn.prepareStatement( "SELECT * FROM article_discussion WHERE category_id = ? AND article_id = ?"); stmt1.setLong("category_id", categoryId); stmt1.setLong("article_id", articleId); you could use arbitrary where clauses like this: SELECT * FROM article_discussion WHERE (article_name LIKE '% baseball %' OR article_name LIKE '% food %') AND category_id IN (?, ?, ?) AND published_date >= to_date('2013-01-01 00:00:00') AND published_date <= to_date('2013-01-31 00:00:00') or do aggregation like this: SELECT category_id, count(article_id), count(comment_id) FROM article_discussion WHERE published_date > ? GROUP BY category_id HAVING count(comment_id) > 100 Regards, James On 03/06/2013 11:42 PM, Asaf Mesika wrote: > I would convert each id to long and then use Bytes.toBytes to convert this > long to a byte array. If it is an int then even better. > Now, write all 3 longs one after another to one array which will be your > rowkey. > This gives you: > * fixed size > * small row key - 3*8 bytes if you use long and 3*4 for int. > > Why do you need to use prefix split policy? > > On Monday, March 4, 2013, Lukáš Drbal wrote: > >> Hi, >> >> i have one question about rowkey design and presplit table. >> >> My usecase: >> I need store a lot of comments where each comment are for one article and >> this article has one category. >> >> What i need: >> 1) read one comment by id (where i know commentId, articleId and >> categoryId) >> 2) read all coments for article (i know categoryId and articleId) >> 3) read all comments for category (i know categoryId) >> >> From this read pattern i see one good rowkey: >> __ >> >> But here i don't have fixed size of rowkey, so i don't know how to define >> split pattern. How can be this solved? >> This id's come from external system and grow very fast, so add some like >> "padding" for each part are hard. >> >> Maybe i can use hash function for each part >> md5(_md5()_md5(), but this rowkey is very >> long (3*32+2 bytes), i don't have experience with this long rowkeys. >> >> Can someone give me a suggestions please? >> >> Regards >> >> Lukas Drbal >>