hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Imran M Yousuf <imyou...@gmail.com>
Subject Re: newbie: need help on understanding HBase
Date Fri, 13 Nov 2009 01:58:24 GMT
On Thu, Nov 12, 2009 at 10:50 PM, Chris Bates
<christopher.andrew.bates@gmail.com> wrote:
> Hi Imran,
>
> I'm a new user as well.  I found these presentations helpful in answering
> most of your questions:
> http://wiki.apache.org/hadoop/HBase/HBasePresentations
>
> There are HBase schema designs in there.
>

I read them, but without the speakers explanation the schema parts
remain unexplained for a dumb newbie like me. I was looking for more
concrete definitions of column family, column, cell etc. and their use
cases. I guess I will have to learn them by experimenting.

> You might also want to read the original BigTable paper and the chapter on
> HBase in OReilly's Hadoop book.
>
> But to answer one of your questions--"Big Data" usually refers to a dataset
> that is millions to billions in length.  But "Big Data" doesn't mean you
> have to use a tool like HBase.  We have some MySQL tables that are 100
> million rows and work fine.  You have to identify what works best for your
> use and use the most appropriate tool.

Thanks, IMHO, I am sure that HBase is more suitable than MySQL simply
because of the complexity and cost in scaling an application with Blob
data.

Thanks a lot,

Imran

>
> On Thu, Nov 12, 2009 at 9:13 AM, Imran M Yousuf <imyousuf@gmail.com> wrote:
>
>> Hi!
>>
>> I am absolutely new to HBase. All I have done is to read up
>> documentation, presentation and getting a single instance up and
>> running. I am starting on a Content Management System which will be
>> used as a backend for multiple web applications of different natures.
>> In the CMS:
>> * User can define their content known as content type.
>> * Content can have  one-2-many one-2-one and many-2-many relationship
>> with other contents.
>> * Content fields should be versioned
>> * Content type can change in runtime, i.e. fields (a.k.a. columns in
>> HBase) added and removal will not be allowed just yet.
>> * Every content type will have a corresponding grammer to validate
>> content of its type.
>> * It will have authentication and authorization
>> * It will have full text search based on Lucene/Katta.
>>
>> Based on these requirements I have the following questions that I
>> would like feedback on:
>> * Reading articles and presentations it looks to be HBase is a perfect
>> match as it supports multi-dimensional rows, versioned cells, dynamic
>> schema modification. But I could not understand what is the definition
>> of "Big Data" - that is if a content size is roughly 1~100kB
>> (field/cell size 0~100kB), is HBase meant for such uses?
>> * Since I am not sure how much load the site will have, I am planning
>> to setup DN+RS on Rackspace cloud instances with 2GB/80GB HDD with a
>> view of with revenue and pageviews increasing, more moderate
>> "commodity" hardware can be added progressively. Any
>> comments/suggestions on this strategy?
>> * Where can I read up on or checkout samples RDBMS schemas converted
>> to HBase schema? Basically, I want to read up efficient schema design
>> for different cardinal relationships between objects.
>>
>> Thank you,
>>
>> --
>> Imran M Yousuf
>> Entrepreneur & Software Engineer
>> Smart IT Engineering
>> Dhaka, Bangladesh
>> Email: imran@smartitengineering.com
>> Blog: http://imyousuf-tech.blogs.smartitengineering.com/
>> Mobile: +880-1711402557
>>
>



-- 
Imran M Yousuf
Entrepreneur & Software Engineer
Smart IT Engineering
Dhaka, Bangladesh
Email: imran@smartitengineering.com
Blog: http://imyousuf-tech.blogs.smartitengineering.com/
Mobile: +880-1711402557

Mime
View raw message