Return-Path: Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: (qmail 1246 invoked from network); 4 Nov 2009 16:54:27 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 4 Nov 2009 16:54:27 -0000 Received: (qmail 33504 invoked by uid 500); 4 Nov 2009 16:54:27 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 33454 invoked by uid 500); 4 Nov 2009 16:54:26 -0000 Mailing-List: contact mapreduce-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-user@hadoop.apache.org Delivered-To: mailing list mapreduce-user@hadoop.apache.org Received: (qmail 33445 invoked by uid 99); 4 Nov 2009 16:54:26 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 04 Nov 2009 16:54:26 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of calvin.lists@gmail.com designates 209.85.219.226 as permitted sender) Received: from [209.85.219.226] (HELO mail-ew0-f226.google.com) (209.85.219.226) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 04 Nov 2009 16:54:18 +0000 Received: by ewy26 with SMTP id 26so6955538ewy.29 for ; Wed, 04 Nov 2009 08:53:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:date:message-id:subject :from:to:content-type; bh=/xX2/mPnCKPiL3tOF7IpsmQJq/LmqzXY3IwHB8ZXal0=; b=Y3hAoW7ReyySE0aWbwO381r3FTMHj48C0QFQW4GiAMqgitVit9yvDtbQxWhCww54Bm MYc6t0ygxXD8fPSpRR9xvX7LRuvY6398JcqMnXW+1LQ/+dSizvigwePZK/UAWf/lrDxi nsUHk15rOLTAPFZES5xkDznu/RXqLlYiVd3Mw= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:content-type; b=v6OU9PCg76ZuvK0gSCC1EAFdaxTzGS91ffMELdWkRHjXcMMm8RletjZOdR63+ZRo1k kom5hxlAccj2tblwLKc6bo18fVBnn/hLsc+LgnqKid7Gkw2+N/jPYHK1HBGTbDV7oRdz SKRwnBBEiVIj0wEMOInR3KWuBhKfJrZe+qs0g= MIME-Version: 1.0 Received: by 10.216.85.136 with SMTP id u8mr556925wee.79.1257353637965; Wed, 04 Nov 2009 08:53:57 -0800 (PST) Date: Wed, 4 Nov 2009 11:53:57 -0500 Message-ID: Subject: timeseries merge/join question From: Calvin To: mapreduce-user@hadoop.apache.org Content-Type: multipart/alternative; boundary=0016e6dab02af2b3f004778e7439 X-Virus-Checked: Checked by ClamAV on apache.org --0016e6dab02af2b3f004778e7439 Content-Type: text/plain; charset=ISO-8859-1 Hey all, I am trying to figure out the best way to approach some joining/merging computation in a map-reduce / hbase framework. I have the following large timeseries datasets (key/value pairs keyed and sorted by time): Events1: t1, event1_value1 t3, event1_value2, ... Event2: t2, event2_value1 t3, event2_value2, t4, event2_value3, .... Currently, I am just storing these as flat files in HDFS but I have no problems throwing them into HBase tables. I am trying to do an operation like the following: for every event in Events2, find and join with the event that immediately precedes (timestamp <=) this event in table Events1. This operation would result in something like: JoinedEvents: t2, events2_value1, events1_value1 t3, events2_value2, events1_value2 t4, events2_value3, events1_value2 etc. What is the best way to go about this in Hadoop? Thanks in advance for the help. --0016e6dab02af2b3f004778e7439 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Hey all,

I am trying to figure out the best way to appro= ach some joining/merging computation in a map-reduce / hbase framework. =A0= I have the following large timeseries datasets (key/value pairs keyed and s= orted by time):

Events1:
t1, event1_value1
t3, even= t1_value2,
...

Event2:
t2= , event2_value1
t3, event2_value2,
t4, event2_value3,
....

Currently, I am just storing these as fl= at files in HDFS but I have no problems throwing them into HBase tables. = =A0I am trying to do an operation like the following: for every event in Ev= ents2, find and join with the event that immediately=A0precedes (timestamp = <=3D)=A0this event in table Events1.

This operation would result in something like:

JoinedEvents:
t2, events2_value1, events1_value1=
t3, events2_value2, events1_value2
t4, events2_value3,= events1_value2

etc.

What is the best way to g= o about this in Hadoop?

Thanks in advance for the = help.
--0016e6dab02af2b3f004778e7439--