Return-Path: Delivered-To: apmail-hadoop-hbase-user-archive@locus.apache.org Received: (qmail 78720 invoked from network); 29 Jul 2008 16:52:41 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 29 Jul 2008 16:52:41 -0000 Received: (qmail 21939 invoked by uid 500); 29 Jul 2008 16:52:40 -0000 Delivered-To: apmail-hadoop-hbase-user-archive@hadoop.apache.org Received: (qmail 21918 invoked by uid 500); 29 Jul 2008 16:52:40 -0000 Mailing-List: contact hbase-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hbase-user@hadoop.apache.org Delivered-To: mailing list hbase-user@hadoop.apache.org Received: (qmail 21906 invoked by uid 99); 29 Jul 2008 16:52:40 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 29 Jul 2008 09:52:40 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of pavlikus@gmail.com designates 66.249.90.181 as permitted sender) Received: from [66.249.90.181] (HELO ik-out-1112.google.com) (66.249.90.181) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 29 Jul 2008 16:51:44 +0000 Received: by ik-out-1112.google.com with SMTP id c28so4793616ika.5 for ; Tue, 29 Jul 2008 09:52:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:from:to :content-type:content-transfer-encoding:mime-version:subject:date :x-mailer; bh=FAcPI3jWyXJwboQpaZolS2JcWuRzi6Ss5277WLPSDEo=; b=vq7HCdjJv4Y0lZpC5VTEu94Fj4xMDJfFjKLA+XGJ/HdOs1iiBTmhtgSXZHG5xtasos CgTOZJT+/VFsTfCzlwEbxnGQQax9DNfW2Qhb3nxJP//+76y8nY/tmwjDfAXK2diphFNW FGVog/qWAlzcrjmbFakuR2qRj/ZaLfw82A9XE= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:from:to:content-type:content-transfer-encoding :mime-version:subject:date:x-mailer; b=AigzX6fRUgR9+Ft8A74jG29rGh09N9rToDbqS+CK3EWqZbutJVy3CI4O7LZWTvY7QF qJG5iuV3Dhiq8SVPaExcStgG+UmJ0DeN3b6JAyRGpdKeVGdNwg5CZe1RYZtVYxrmgaNz pd0ks5ayJxhyUgUfPSanuwqg4v0eksQsF9uzI= Received: by 10.210.34.2 with SMTP id h2mr213015ebh.38.1217350329777; Tue, 29 Jul 2008 09:52:09 -0700 (PDT) Received: from ?192.168.1.5? ( [92.112.21.104]) by mx.google.com with ESMTPS id f4sm34920158nfh.27.2008.07.29.09.52.07 (version=TLSv1/SSLv3 cipher=RC4-MD5); Tue, 29 Jul 2008 09:52:08 -0700 (PDT) Message-Id: <1B74AFC6-67EC-40D6-A8C5-69D1E5028FFE@gmail.com> From: Pavel Lysov To: hbase-user@hadoop.apache.org Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (Apple Message framework v926) Subject: to mapreduce or not to mapreduce? Date: Tue, 29 Jul 2008 19:51:58 +0300 X-Mailer: Apple Mail (2.926) X-Virus-Checked: Checked by ClamAV on apache.org Hey all! I'd like to ask you to take a look at the stuff I have and advice is it right direction to proceed it with map reduce approach? There's MESSAGES table, each message has sender and recipient. It works nice so far and next I want to get the following info: Total of messages user X has sent Total of messages user X has received Total of messages in the system It would be USERS table, with USER_ID as row key and with 'messages' column family: messages:total_sent 345 messages:total_received 543 Similar to the above, I'd create SYSTEM table with 'messages:total' column that will hold the total count of messages. Next I think I should implement map reduce job that will update 'messages:total_sent/total_received' for every user by adding one to output collector for given user id. Next, in reduce, I'll sum them up and update the user's row. Is it good idea to do like that? Could it cause any probs if more than two concurrent reduce jobs will try to update the same row? Similar question for SYSTEM table, suppose there a bunch of reduce jobs that try to update messages:total column at the same time? I think table locks would help there but it seems I am missing some basis understanding of how that all is supposed to work. Could you please advice? I appreciate your help! Pavel Lysov pavlikus@gmail.com