Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6D7E2E838 for ; Tue, 19 Feb 2013 21:28:11 +0000 (UTC) Received: (qmail 78401 invoked by uid 500); 19 Feb 2013 21:28:06 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 78320 invoked by uid 500); 19 Feb 2013 21:28:06 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 78311 invoked by uid 99); 19 Feb 2013 21:28:06 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 19 Feb 2013 21:28:06 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of wellington.chevreuil@gmail.com designates 209.85.216.174 as permitted sender) Received: from [209.85.216.174] (HELO mail-qc0-f174.google.com) (209.85.216.174) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 19 Feb 2013 21:27:59 +0000 Received: by mail-qc0-f174.google.com with SMTP id z24so2807923qcq.33 for ; Tue, 19 Feb 2013 13:27:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type; bh=jXyfrAIwbrLtcvCsORy71/Ra1Y/TaLofbCLrVIabxyI=; b=bRZ2d76TPjdijI1I2H2tsQH07/zN1zLRs5oWCfAv+l+kpCAmK8vXSyKsAa482u92u3 JI+BVWYW0BmThX2VvHNG7nJQ5vn75zPBUy4QjKTlAnr0/rSDuyGlU7bmDnB+xXQjzDMb g93wc71MErPeGHY5hqXKuhFUP39CKZYAPq9GQHGj25WhugvYQmtNTN9wqgsGZluif/0U ytOTI0Hcx1RLeqKLCtiulKzKhiO8xkfNgyxoMXQafFWBsP3CA1wbHLE2Zhi9VPe4HgPl 2uTx3QX5qN0Kx5lrcTycItUykaV88oJXNlR209FPz+t3/09rQizxyrH5eVZ1d/DleOK5 6Erg== MIME-Version: 1.0 X-Received: by 10.49.128.37 with SMTP id nl5mr8493146qeb.59.1361309258231; Tue, 19 Feb 2013 13:27:38 -0800 (PST) Received: by 10.224.105.203 with HTTP; Tue, 19 Feb 2013 13:27:38 -0800 (PST) In-Reply-To: References: Date: Tue, 19 Feb 2013 21:27:38 +0000 Message-ID: Subject: Re: Newbie: HBase good for Tree like structure? From: Wellington Chevreuil To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=047d7b676e6ec42ea504d61a8251 X-Virus-Checked: Checked by ClamAV on apache.org --047d7b676e6ec42ea504d61a8251 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Hi Jos=E9, I think your structure is ok to define HBase row keys. The main issue you`ll have then is row you`ll be able to build these keys, so that you can properly access your tree nodes. Regarding your scalability concerns, you should not worry to start with a small Hadoop/Hbase cluster (even standalone) for development/concept proof purposes, but that definitely will require a more robust environment if you get to a billion of rows later. You'll have to start thinking on read/write load patterns, so that you'll be able to take the best advantage of HBase as your problem solution. Regards, Wellington. 2013/2/19 Jos=E9 Feiteirinha > Dear all, > > I hope this is the right place for this question. > > I'm currently in the starting stages of developing a software that may > 'explode' in terms of users and data. I'm considering a very basic > tree-like data-structure and would like to know your thoughts regarding > HBase/Hadoop. > > My reason is that I would like to be prepared from the get-go for large > data. > > My structure is planned as such: > > - The data be nodes of a huge multidimensional tree. > - I'm planning on having each row containing the full node path, e.g. > "root.grandparentX.parentY.babyZ" (or ? "babyZ.parentY.grandparentX.ro= ot" ) > - However in terms of data per node, it should be pretty much static. > > > While this is a very simple structure, it does seem to be beneficial to > use HBase / Hadoop just for the scalability alone. I also understood that > if I get to billions of rows, only an HBase like approach can sustain me? > > My idea is to start with a simple standalone server and then expand the > cluster as the load & data grow. > > If you may, > I would like your thoughts, mostly regarding weather I'm using an Hammer > to kill Ants, my proposed data-structure or any other advice you may have= . > > > Kind regards, > Jos=E9 > > -- > Jos=E9 Feiteirinha > > www.feiteira.org > --047d7b676e6ec42ea504d61a8251 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Hi Jos=E9,

I think your structure is ok to define HBase = row keys. The main issue you`ll have then is row you`ll be able to build th= ese keys, so that you can properly access your tree nodes.

Regarding your scalability concerns, you should not worry to start wit= h a small Hadoop/Hbase cluster (even standalone) for development/concept pr= oof purposes, but that definitely will require a more robust environment if= you get to a billion of rows later. You'll have to start thinking on r= ead/write load patterns, so that you'll be able to take the best advant= age of HBase as your problem solution.

Regards,
Wellington.=A0

2013/2/19 Jos=E9 Feiteirinha <j@feiteira.org>
<= blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px= #ccc solid;padding-left:1ex"> Dear all,

I hope t= his is the right place for this question.

I'm cur= rently in the starting stages of developing a software that may 'explod= e' in terms of users and data. I'm considering a very basic tree-li= ke data-structure and would like to know your thoughts regarding HBase/Hado= op.

My reason is that I would like to be prepare= d from the get-go=A0fo= r large data.

My structure is planned as such:
  • The data be nod= es of a huge multidimensional tree.
  • I'm planning on havin= g each row containing the full node path, e.g. "root.grandparentX.pare= ntY.babyZ" (or ? "babyZ.parentY.grandparentX.root" )<= /li>
  • However in terms of data = per node, it should be pretty much static.

While this is a very simple structure, it does seem to be benefici= al to use HBase / Hadoop just for the=A0scalability=A0alone. I also understood that if I get to billions of r= ows, only an HBase like approach can sustain me?

My idea is to start with a simple standalone server and= then expand the cluster as the load & data grow.

If you may,
I would like your thoughts, mostly = regarding weather I'm using an Hammer to kill Ants, my proposed=A0data-= structure=A0or any other advice you may have.


Kind regards,
Jos=E9

--
= Jos=E9 Feiteirinha


--047d7b676e6ec42ea504d61a8251--