Return-Path: Delivered-To: apmail-hadoop-hbase-user-archive@minotaur.apache.org Received: (qmail 31060 invoked from network); 27 Mar 2010 17:55:13 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 27 Mar 2010 17:55:13 -0000 Received: (qmail 62358 invoked by uid 500); 27 Mar 2010 17:55:12 -0000 Delivered-To: apmail-hadoop-hbase-user-archive@hadoop.apache.org Received: (qmail 62319 invoked by uid 500); 27 Mar 2010 17:55:12 -0000 Mailing-List: contact hbase-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hbase-user@hadoop.apache.org Delivered-To: mailing list hbase-user@hadoop.apache.org Received: (qmail 62311 invoked by uid 99); 27 Mar 2010 17:55:12 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 27 Mar 2010 17:55:12 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of timrobertson100@gmail.com designates 209.85.220.213 as permitted sender) Received: from [209.85.220.213] (HELO mail-fx0-f213.google.com) (209.85.220.213) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 27 Mar 2010 17:55:05 +0000 Received: by fxm5 with SMTP id 5so2758481fxm.29 for ; Sat, 27 Mar 2010 10:54:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:received:message-id:subject:from:to:content-type; bh=qgcS999M06Efx1HfS/AatH3Ijb2uGbGAQ/i4ACaZZF0=; b=jE5Z2kjQtOhv7qQeeNcnxnQzUA0bjj9ItM4ZB/7M1c6XdOKBU+T36P84L2EotFWMP0 wj3RwPFNbXsB5/liB/4KMtwN+A3T4jQmBAleK7VhP9TsRAjR/e5GRB7QfKmBdAzaiQ+G TwNsIu7f0QTnVUCfPDO7tVtYbnRkMetJGG1Y0= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=Fal4J3YQg9UtYlnbGSVGSKueddlSNbuZBsqEut9mspUXhe8/KQG/O2WyewMQ0ryor1 7OkkA2oO9mTjbEq9mhNJYAVSSCL6YWfLzEJo4vqv/bVRJ+HezWX6klibWZiEcBwo0UXG /qa/qS3CjHS5AjOlWIdEDH/GjKk3WHm5fEoW8= MIME-Version: 1.0 Received: by 10.103.221.4 with HTTP; Sat, 27 Mar 2010 10:54:44 -0700 (PDT) In-Reply-To: References: Date: Sat, 27 Mar 2010 18:54:44 +0100 Received: by 10.103.80.8 with SMTP id h8mr1375153mul.90.1269712484879; Sat, 27 Mar 2010 10:54:44 -0700 (PDT) Message-ID: <32120a6a1003271054h79777b3bsf0cb575b6e21f161@mail.gmail.com> Subject: Re: Questions about data distribution in HBase From: Tim Robertson To: hbase-user@hadoop.apache.org Content-Type: text/plain; charset=ISO-8859-1 X-Virus-Checked: Checked by ClamAV on apache.org I would consider option 3) if it were me (I am not an expert). It is common to use HBase tables as the input format for map reduce jobs. I don't think it is as easy as assuming that the 3 videos will go over 3 machines when storing, but certainly as the volume grows it will distribute, and by using MR the processing will try and run as close to the data as possible. Cheers, Tim On Sat, Mar 27, 2010 at 6:06 PM, William Kang wrote: > Hi, > I am quite confused about the distributions of data in a HBase system. > For instance, if I store 10 videos in 10 HTable rows' cell, I assume that > these 10 videos will be stored in different data nodes (regionservers) in > HBase. Now, if I wrote a program that do some processes for these 10 videos > parallel, what' going to happen? > Since I only deployed the program in a jar to the master server in HBase, > will all videos in the HBase system have to be transfered into the master > server to get processed? > 1. Or do I have another option to assign where the computing should happen > so I do not have to transfer the data over the network and use the region > server's cpu to calculate the process? > 2. Or should I deploy the program jar to each region server so the region > server can use local cpu on the local data? Will HBase system do that > automatically? > 3. Or I need plug M/R into HBase in order to use the local data and > parallelization in processes? > Many thanks. > > > William >