Mailing-List: contact user-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hbase.apache.org
Received-SPF: pass (nike.apache.org: domain of michael_segel@hotmail.com
 designates 65.55.111.105 as permitted sender)
Message-ID: <BLU0-SMTP4077525204ACBD724F701E28F690@phx.gbl>
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0 (Mac OS X Mail 6.2 \(1499\))
Subject: Re: Nosqls schema design
From: Michael Segel <michael_segel@hotmail.com>
In-Reply-To: <loom.20121108T095122-178@post.gmane.org>
Date: Thu, 8 Nov 2012 08:55:23 -0600
Content-Transfer-Encoding: quoted-printable
References: <loom.20121108T095122-178@post.gmane.org>
To: user@hbase.apache.org

Ok...=20

First, if you're estimating that the raw data would be 10TB, you will =
find out that you will need a bit more to handle the data in terms of =
indexing and denormalized structures. =20

The short answer to your question is yes, you can do it.=20

Longer answer...=20

You can bake a solution in both a relational and HBase/NoSQL solution, =
however, you will be close to hitting the ceiling on RDBMS and you will =
be spending a fortune on licensing and hardware.=20

If you want to do this in terms of HBase, you can.=20

Most of the queries are straight forward, however you will be =
duplicating data.=20

The interesting query:=20
> - All users that have commented a page W and liked a page P.

This will require a map/reduce job to produce an answer.  Well maybe not =
if you're using secondary indexing techniques. Then it would be an =
intersection of two result sets to give you the final set of users.=20

HTH


On Nov 8, 2012, at 3:00 AM, Nick maillard =
<nicolas.maillard@fifty-five.com> wrote:

> Hi everyone
>=20
> I'm currently testing Hbase/Hadoop in terms of performance but also in =
terms off
> applicability. After some tries, and reads I'm wondering If Hbase is =
well fitted
> for the current need I'm testing.=20
>=20
> Say I had logs on websites listing users going to webpage, reading an =
article,
> liking a piece of data, commenting or even bookmarking.
> I would store these logs on a long period and for a lot of different =
websites
> and I would like to use the data with these questions:
> - All users that have been to the webpage X in the last Ndays
> - All users that have liked and then bookmarked a page in a range of Y =
days.
> - All the pages that are commented X times in the last N days.
> - All users that have commented a page W and liked a page P.
> - All pages seen,liked or commented by a given user.
>=20
> As you see this might a very SQL way of thinking. The way I understand =
the
> questions being different in nature I would have different tables to =
answer them.
> Am I correct? How could this be represented and would sql be a better =
fit?
> The data would be large around a 10 Tbytes.
>=20
> regards
>=20
>=20