Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5A77DE5C6 for ; Sun, 17 Feb 2013 23:31:39 +0000 (UTC) Received: (qmail 42522 invoked by uid 500); 17 Feb 2013 23:31:37 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 42473 invoked by uid 500); 17 Feb 2013 23:31:37 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 42465 invoked by uid 99); 17 Feb 2013 23:31:37 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 17 Feb 2013 23:31:37 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of dontariq@gmail.com designates 209.85.128.172 as permitted sender) Received: from [209.85.128.172] (HELO mail-ve0-f172.google.com) (209.85.128.172) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 17 Feb 2013 23:31:31 +0000 Received: by mail-ve0-f172.google.com with SMTP id cz11so4414799veb.17 for ; Sun, 17 Feb 2013 15:31:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:mime-version:in-reply-to:references:from:date:message-id :subject:to:content-type; bh=6xqBdD3ZCeXe2zgVFIfZTTv7BM9cAwfIqygld4MvMA0=; b=aNPv9J068sIAPQJh+bjQ2MT1DqHXY9QbFP+8f7nnD8oylz6ns0NKvIZOrKsRcDX6K3 5Pch1zy/YVtsP9VYp8DsHnOTKZatSM00f0ws0COe3Cuhsvf+a2XpPRXV9ZnXmogtHoke 7aOig2+h0Jc2YgKDsDV5l1EpBxtAJ6gTafTTXQTKc6gdUvdyVB7LrGoL7HnrqazZSus8 8PTkuT4FHPx6uRxj776OEFW1Ephxcv9VE3A+e9NCX+HZRfAcc7dvcMOWs/0M6DSZwudU P/sJWWNG+pc0t7EosY/+8qWUJpGmm93N65KDj078un+cEKneyM6/jvc9G8+7cEpmnOH+ y6Rw== X-Received: by 10.58.254.33 with SMTP id af1mr13716886ved.0.1361143870207; Sun, 17 Feb 2013 15:31:10 -0800 (PST) MIME-Version: 1.0 Received: by 10.59.8.227 with HTTP; Sun, 17 Feb 2013 15:30:30 -0800 (PST) In-Reply-To: References: <51215EC7.1010003@salesforce.com> From: Mohammad Tariq Date: Mon, 18 Feb 2013 05:00:30 +0530 Message-ID: Subject: Re: Row Key Design in time based aplication To: "user@hbase.apache.org" Content-Type: multipart/alternative; boundary=047d7bdc8db0df356904d5f40007 X-Virus-Checked: Checked by ClamAV on apache.org --047d7bdc8db0df356904d5f40007 Content-Type: text/plain; charset=ISO-8859-1 Hello Mehmet, If ProjectIds are sequential, then it is definitely not a feasible approach. Division is just to make sure that all the regions are evenly loaded. You can create pre-splitted tables to avoid the region hotspotting. Alternatively hash your rowkeys so that all the regionservers receive equal load. Warm Regards, Tariq https://mtariq.jux.com/ cloudfront.blogspot.com On Mon, Feb 18, 2013 at 4:48 AM, Michael Segel wrote: > I'm not sure how a SQL interface above HBase will solve some of the issues > with regional hot spotting when using time as the key. Or the problem with > always adding data to the right of the last row. > > The same would apply with the project id, assuming that it too is a number > that grows incrementally with each project. > On Feb 17, 2013, at 4:50 PM, James Taylor wrote: > > > Hello, > > Have you considered using Phoenix ( > https://github.com/forcedotcom/phoenix) for this use case? Phoenix is a > SQL layer on top of HBase. For this use case, you'd connect to your cluster > like this: > > > > Class.forName("com.salesforce.phoenix.jdbc.PhoenixDriver"); // register > driver > > Connection conn = > DriverManager..getConnection("jdbc:phoenix:localhost"); // connect to local > HBase > > > > Create a table like this (adding additional columns that you want to > measure, like txn_count below): > > > > conn.createStatement().execute( > > "CREATE TABLE event_log (\n" + > > " project_id INTEGER NOT NULL, \n" + > > " time DATE NOT NULL,\n" + > > "txn_count LONG\n" + > > "CONSTRAINT pk PRIMARY KEY (project_id, time))"); > > > > Then to insert data you'd do this: > > > > PreparedStatement preparedStmt = conn.prepareStatement( > > "UPSERT INTO event_log VALUES(?,?,0)"); > > > > and you'd bind the values in JDBC like this: > > > > preparedStmt.setInt(1, projectId); > > preparedStmt.setDate(2, time); > > preparedStmt.execute(); > > > > conn.commit(); // If upserting many values, you'd want to commit after > upserting maybe 1000-10000 rows > > > > Then at query data time, assuming you want to report on this data by > grouping into different "time buckets", you could do as show below. Phoenix > stores your date values at the millisecond granularity and you can decide a > query time how you'd like to roll it up: > > > > // Query with time bucket at the hour granularity > > conn.createStatement().execute( > > "SELECT\n" + > > " project_id, TRUNC(time,'HOUR') as time_bucket, \n" + > > " MIN(txnCount), MAX(txnCount), AVG(txnCount) FROM event_log\n" + > > "GROUP BY project_id, TRUNC(time,'HOUR')"); > > > > // Query with time bucket at the day granularity > > conn.createStatement().execute( > > "SELECT\n" + > > " project_id, TRUNC(time,'DAY') as time_bucket,\n" + > > " MIN(txnCount), MAX(txnCount), AVG(txnCount) FROM event_log\n" + > > "GROUP BY project_id, TRUNC(time,'DAY')"); > > > > You could, of course include a WHERE clause in the query to filter based > on the range of dates, particular projectIds, etc. like this: > > > > conn.prepareStatement( > > "SELECT\n" + > > " project_id, TRUNC(time,'DAY') as time_bucket,\n" + > > " MIN(txnCount), MAX(txnCount), AVG(txnCount) FROM event_log\n" + > > "WHERE project_id IN (?, ?, ?) AND date >= ? AND date < ?\n" + > > "GROUP BY project_id, TRUNC(time,'DAY')"); > > preparedStmt.setInt(1, projectId1); > > preparedStmt.setInt(2, projectId2); > > preparedStmt.setInt(3, projectId3); > > preparedStmt.setDate(4, beginDate); > > preparedStmt.setDate(5, endDate); > > preparedStmt.execute(); > > > > > > HTH. > > > > Regards, > > > > James > > > > On 02/17/2013 11:33 AM, Mehmet Simsek wrote: > >> Hi, > >> > >> I want to hold event log data in hbase but I couldn't decide row key. I > must hold project id and time,I will use project ld and time combination > while searching. > >> > >> Row key can be below > >> > >> ProjectId+timeInMs > >> > >> In similiar application(open source TSDB) time is divided 1000 to round > in this project.I can use this strategy but I don't know how we decide what > divider must be? 1000 or 10000. > >> > >> Why time is divided 1000 in this application? why didn't be hold > without division? > >> > >> Can you explain this strategy? > >> > > > > > > Michael Segel | (m) 312.755.9623 > > Segel and Associates > > > --047d7bdc8db0df356904d5f40007--