Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 662E8EF78 for ; Mon, 28 Jan 2013 16:24:33 +0000 (UTC) Received: (qmail 48395 invoked by uid 500); 28 Jan 2013 16:24:31 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 48290 invoked by uid 500); 28 Jan 2013 16:24:31 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 48281 invoked by uid 99); 28 Jan 2013 16:24:31 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 28 Jan 2013 16:24:31 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of oruchovets@gmail.com designates 209.85.212.169 as permitted sender) Received: from [209.85.212.169] (HELO mail-wi0-f169.google.com) (209.85.212.169) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 28 Jan 2013 16:24:23 +0000 Received: by mail-wi0-f169.google.com with SMTP id hq12so1639645wib.0 for ; Mon, 28 Jan 2013 08:24:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type; bh=7pA5wKylDk5jl/w8NL0t02qt/ib83htdokEX/cPM41M=; b=mqpzV8P0HuoUx3D2OM9oSeG1iSGuk8w/71q8QTuWki+FbfBvlDMVzwZsmP061cqo22 UWcVjX+ZmpqpvypdHZqDxGUZtWkXAq7J3UrCYu2t7blnL0vbTYvwOQeYpDtN42gJsXDx g7oTkMcnk9X80Vuyh8pyfazzcV8NeX/9f08LemJabEF9NRVVjtAATiJM/ENTzscxrSeS gbFDeL2pJkjwjXp258fswcBE+YtbwPUVvege0VwIXuoGLmXDxXQZjwlivoRl3TXCscNp XZXQtdZt7P2o+YIPBM1jx0vwDKI6k6uGFA+n19XhQwTUXripdLojyhRbqq+Ua6vFTDHd tCIQ== MIME-Version: 1.0 X-Received: by 10.194.158.100 with SMTP id wt4mr22349843wjb.37.1359390242594; Mon, 28 Jan 2013 08:24:02 -0800 (PST) Received: by 10.216.183.205 with HTTP; Mon, 28 Jan 2013 08:24:02 -0800 (PST) In-Reply-To: References: Date: Mon, 28 Jan 2013 18:24:02 +0200 Message-ID: Subject: Re: how to model data based on "time bucket" From: Oleg Ruchovets To: "user@hbase.apache.org" Content-Type: multipart/alternative; boundary=089e0115ee3c855e9404d45bb48d X-Virus-Checked: Checked by ClamAV on apache.org --089e0115ee3c855e9404d45bb48d Content-Type: text/plain; charset=ISO-8859-1 Yes , I agree that using only timestamp it will cause hotspot. I can create prespliting for regions. I saw TSDB video and presentation and their data model. I think this is not suitable for my case. I looked thru google alot and for my surprise there is any post about such clasic problem. It is very strange. I try to group timeseries not like most solutions provides -- every 1h , 1day , 5 minutes. it is simple. I need to group element relatively to itself by time: I mean I have {event1: 10:05} and I want to group it with elements which was after 10:05 during time X. in case X=7 minutes all events between 10:05 and 10:12 will be in the group. It is like a join of each row with all other rows , but the performance will be very bad. Currently I have 50Millon events => so it will be 50Million^2. That is why I don't want to use pure map/reduce. I want to use hbase as output of map/reduce and model data in a such way I described above. So is there a way to model data in such tipe of time buckets? Please advice. Thanks Oleg. On Mon, Jan 28, 2013 at 5:54 PM, Michel Segel wrote: > Tough one in that if your events are keyed on time alone, you will hit a > hot spot on write. Reads,not so much... > > TSDB would be a good start ... > > You may not need 'buckets' but just a time stamp and set up a start and > stop key values. > > Sent from a remote device. Please excuse any typos... > > Mike Segel > > On Jan 28, 2013, at 7:06 AM, Oleg Ruchovets wrote: > > > Hi , > > > > I have such row data structure: > > > > event_id | time > > ============= > > event1 | 10:07 > > event2 | 10:10 > > event3 | 10:12 > > > > event4 | 10:20 > > event5 | 10:23 > > event6 | 10:25 > > > > > > Numbers of records is 50-100 million. > > > > > > Question: > > > > I need to find group of events starting form eventX and enters to the > time > > window bucket = T. > > > > > > For example: if T=7 munutes. > > Starting from event event1- {event1, event2 , event3} were detected > durint > > 7 minutes. > > > > Starting from event event2- {event2 , event3} were detected durint 7 > > minutes. > > > > Starting from event event4 - {event4, event5 , event6} were detected > during > > 7 minutes. > > Is there a way to model the data in hbase to get? > > > > Thanks > --089e0115ee3c855e9404d45bb48d--