Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A9B49DD68 for ; Thu, 8 Nov 2012 14:55:59 +0000 (UTC) Received: (qmail 83790 invoked by uid 500); 8 Nov 2012 14:55:57 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 83558 invoked by uid 500); 8 Nov 2012 14:55:57 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 83533 invoked by uid 99); 8 Nov 2012 14:55:56 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 08 Nov 2012 14:55:56 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of michael_segel@hotmail.com designates 65.55.111.105 as permitted sender) Received: from [65.55.111.105] (HELO blu0-omc2-s30.blu0.hotmail.com) (65.55.111.105) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 08 Nov 2012 14:55:47 +0000 Received: from BLU0-SMTP407 ([65.55.111.73]) by blu0-omc2-s30.blu0.hotmail.com with Microsoft SMTPSVC(6.0.3790.4675); Thu, 8 Nov 2012 06:55:26 -0800 X-Originating-IP: [173.15.87.37] X-EIP: [h9l7bjSgPAIrW/Quh3LKDbUwZviOIodX] X-Originating-Email: [michael_segel@hotmail.com] Message-ID: Received: from [192.168.0.100] ([173.15.87.37]) by BLU0-SMTP407.blu0.hotmail.com over TLS secured channel with Microsoft SMTPSVC(6.0.3790.4675); Thu, 8 Nov 2012 06:55:24 -0800 Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 (Mac OS X Mail 6.2 \(1499\)) Subject: Re: Nosqls schema design From: Michael Segel In-Reply-To: Date: Thu, 8 Nov 2012 08:55:23 -0600 Content-Transfer-Encoding: quoted-printable References: To: user@hbase.apache.org X-Mailer: Apple Mail (2.1499) X-OriginalArrivalTime: 08 Nov 2012 14:55:24.0988 (UTC) FILETIME=[157417C0:01CDBDC1] X-Virus-Checked: Checked by ClamAV on apache.org Ok...=20 First, if you're estimating that the raw data would be 10TB, you will = find out that you will need a bit more to handle the data in terms of = indexing and denormalized structures. =20 The short answer to your question is yes, you can do it.=20 Longer answer...=20 You can bake a solution in both a relational and HBase/NoSQL solution, = however, you will be close to hitting the ceiling on RDBMS and you will = be spending a fortune on licensing and hardware.=20 If you want to do this in terms of HBase, you can.=20 Most of the queries are straight forward, however you will be = duplicating data.=20 The interesting query:=20 > - All users that have commented a page W and liked a page P. This will require a map/reduce job to produce an answer. Well maybe not = if you're using secondary indexing techniques. Then it would be an = intersection of two result sets to give you the final set of users.=20 HTH On Nov 8, 2012, at 3:00 AM, Nick maillard = wrote: > Hi everyone >=20 > I'm currently testing Hbase/Hadoop in terms of performance but also in = terms off > applicability. After some tries, and reads I'm wondering If Hbase is = well fitted > for the current need I'm testing.=20 >=20 > Say I had logs on websites listing users going to webpage, reading an = article, > liking a piece of data, commenting or even bookmarking. > I would store these logs on a long period and for a lot of different = websites > and I would like to use the data with these questions: > - All users that have been to the webpage X in the last Ndays > - All users that have liked and then bookmarked a page in a range of Y = days. > - All the pages that are commented X times in the last N days. > - All users that have commented a page W and liked a page P. > - All pages seen,liked or commented by a given user. >=20 > As you see this might a very SQL way of thinking. The way I understand = the > questions being different in nature I would have different tables to = answer them. > Am I correct? How could this be represented and would sql be a better = fit? > The data would be large around a 10 Tbytes. >=20 > regards >=20 >=20