hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tatsuya Kawano <tatsuya6...@modolab.info>
Subject Re: newbie: need help on understanding HBase
Date Fri, 13 Nov 2009 01:02:24 GMT
Hi Imran,

> * Where can I read up on or checkout samples RDBMS schemas converted
> to HBase schema? Basically, I want to read up efficient schema design
> for different cardinal relationships between objects.

I would recommend the following presentations and paper:

Practical HBase  [ Page 27 -- 33 ]
by Jon Gray and Michael Stack, Apachecon2009 in Oakland
http://wiki.apache.org/hadoop/HBase/HBasePresentations?action=AttachFile&do=view&target=ApacheCon2009_Practical_HBase-1.pdf


Paper: No Relation: The Mixed Blessings of Non-Relational&Databases
by Ian Thomas Varley
http://ianvarley.com/UT/MR/Varley_MastersReport_Full_2009-08-07.pdf


HBase Schema Design -- Case Studies
by Evan(Qingyan) Liu
http://www.slideshare.net/hmisty/20090713-hbase-schema-design-case-studies


There are some fundamental differences to RDBMS schema design:

-- De-normalization is the key to design HBase schemas.
-- Carefully pick the primary key, try to fulfill all queries without
having secondary indices. Use composite primary key if you need.
-- Be aware you can store a value not only in the cell but also column
qualifier.
-- Each cell is byte array and typeless, and you can store
multi-values in a cell by serializing them with Google protobuf,
Apache Avro or JSON.


Hope this helps,

-- 
Tatsuya Kawano (Mr.)
Tokyo, Japan



On Thu, Nov 12, 2009 at 11:13 PM, Imran M Yousuf <imyousuf@gmail.com> wrote:
> Hi!
>
> I am absolutely new to HBase. All I have done is to read up
> documentation, presentation and getting a single instance up and
> running. I am starting on a Content Management System which will be
> used as a backend for multiple web applications of different natures.
> In the CMS:
> * User can define their content known as content type.
> * Content can have  one-2-many one-2-one and many-2-many relationship
> with other contents.
> * Content fields should be versioned
> * Content type can change in runtime, i.e. fields (a.k.a. columns in
> HBase) added and removal will not be allowed just yet.
> * Every content type will have a corresponding grammer to validate
> content of its type.
> * It will have authentication and authorization
> * It will have full text search based on Lucene/Katta.
>
> Based on these requirements I have the following questions that I
> would like feedback on:
> * Reading articles and presentations it looks to be HBase is a perfect
> match as it supports multi-dimensional rows, versioned cells, dynamic
> schema modification. But I could not understand what is the definition
> of "Big Data" - that is if a content size is roughly 1~100kB
> (field/cell size 0~100kB), is HBase meant for such uses?
> * Since I am not sure how much load the site will have, I am planning
> to setup DN+RS on Rackspace cloud instances with 2GB/80GB HDD with a
> view of with revenue and pageviews increasing, more moderate
> "commodity" hardware can be added progressively. Any
> comments/suggestions on this strategy?
> * Where can I read up on or checkout samples RDBMS schemas converted
> to HBase schema? Basically, I want to read up efficient schema design
> for different cardinal relationships between objects.
>
> Thank you,
>
> --
> Imran M Yousuf
> Entrepreneur & Software Engineer
> Smart IT Engineering
> Dhaka, Bangladesh
> Email: imran@smartitengineering.com
> Blog: http://imyousuf-tech.blogs.smartitengineering.com/
> Mobile: +880-1711402557

Mime
View raw message