hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <michael_se...@hotmail.com>
Subject Re: Nosqls schema design
Date Thu, 08 Nov 2012 14:55:23 GMT

First, if you're estimating that the raw data would be 10TB, you will find out that you will
need a bit more to handle the data in terms of indexing and denormalized structures.  

The short answer to your question is yes, you can do it. 

Longer answer... 

You can bake a solution in both a relational and HBase/NoSQL solution, however, you will be
close to hitting the ceiling on RDBMS and you will be spending a fortune on licensing and

If you want to do this in terms of HBase, you can. 

Most of the queries are straight forward, however you will be duplicating data. 

The interesting query: 
> - All users that have commented a page W and liked a page P.

This will require a map/reduce job to produce an answer.  Well maybe not if you're using secondary
indexing techniques. Then it would be an intersection of two result sets to give you the final
set of users. 


On Nov 8, 2012, at 3:00 AM, Nick maillard <nicolas.maillard@fifty-five.com> wrote:

> Hi everyone
> I'm currently testing Hbase/Hadoop in terms of performance but also in terms off
> applicability. After some tries, and reads I'm wondering If Hbase is well fitted
> for the current need I'm testing. 
> Say I had logs on websites listing users going to webpage, reading an article,
> liking a piece of data, commenting or even bookmarking.
> I would store these logs on a long period and for a lot of different websites
> and I would like to use the data with these questions:
> - All users that have been to the webpage X in the last Ndays
> - All users that have liked and then bookmarked a page in a range of Y days.
> - All the pages that are commented X times in the last N days.
> - All users that have commented a page W and liked a page P.
> - All pages seen,liked or commented by a given user.
> As you see this might a very SQL way of thinking. The way I understand the
> questions being different in nature I would have different tables to answer them.
> Am I correct? How could this be represented and would sql be a better fit?
> The data would be large around a 10 Tbytes.
> regards

View raw message