hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Imran M Yousuf <imyou...@gmail.com>
Subject Re: newbie: need help on understanding HBase
Date Fri, 13 Nov 2009 02:07:32 GMT
On Thu, Nov 12, 2009 at 10:55 PM, Tim Robertson
<timrobertson100@gmail.com> wrote:
> Another newbie here so this might not be wholly accurate.
>
>> Since I am not sure how much load the site will have, I am planning
>> to setup DN+RS on Rackspace cloud instances with 2GB/80GB HDD with a
>> view of with revenue and pageviews increasing, more moderate
>
> I think you might struggle here... I think you are recommended 1G for
> DN, >1G for RS and then room for the MapReduce tasks if needed.
> My understanding is you really should aim for >=8G on each node but 4G
> is the lowest you should consider.
>

Thanks, I saw the spec of a simple production environment
configuration in "Practical HBase" presentation of ApacheCon2009, I
just did not understand why the 8G was required :), but even in the
virtual environment RAM would not be a problem, I can increase it 6G
easily.

Thank you for the clarification.

Imran

> Tim
>
>
> On Thu, Nov 12, 2009 at 4:50 PM, Chris Bates
> <christopher.andrew.bates@gmail.com> wrote:
>> Hi Imran,
>>
>> I'm a new user as well.  I found these presentations helpful in answering
>> most of your questions:
>> http://wiki.apache.org/hadoop/HBase/HBasePresentations
>>
>> There are HBase schema designs in there.
>>
>> You might also want to read the original BigTable paper and the chapter on
>> HBase in OReilly's Hadoop book.
>>
>> But to answer one of your questions--"Big Data" usually refers to a dataset
>> that is millions to billions in length.  But "Big Data" doesn't mean you
>> have to use a tool like HBase.  We have some MySQL tables that are 100
>> million rows and work fine.  You have to identify what works best for your
>> use and use the most appropriate tool.
>>
>> On Thu, Nov 12, 2009 at 9:13 AM, Imran M Yousuf <imyousuf@gmail.com> wrote:
>>
>>> Hi!
>>>
>>> I am absolutely new to HBase. All I have done is to read up
>>> documentation, presentation and getting a single instance up and
>>> running. I am starting on a Content Management System which will be
>>> used as a backend for multiple web applications of different natures.
>>> In the CMS:
>>> * User can define their content known as content type.
>>> * Content can have  one-2-many one-2-one and many-2-many relationship
>>> with other contents.
>>> * Content fields should be versioned
>>> * Content type can change in runtime, i.e. fields (a.k.a. columns in
>>> HBase) added and removal will not be allowed just yet.
>>> * Every content type will have a corresponding grammer to validate
>>> content of its type.
>>> * It will have authentication and authorization
>>> * It will have full text search based on Lucene/Katta.
>>>
>>> Based on these requirements I have the following questions that I
>>> would like feedback on:
>>> * Reading articles and presentations it looks to be HBase is a perfect
>>> match as it supports multi-dimensional rows, versioned cells, dynamic
>>> schema modification. But I could not understand what is the definition
>>> of "Big Data" - that is if a content size is roughly 1~100kB
>>> (field/cell size 0~100kB), is HBase meant for such uses?
>>> * Since I am not sure how much load the site will have, I am planning
>>> to setup DN+RS on Rackspace cloud instances with 2GB/80GB HDD with a
>>> view of with revenue and pageviews increasing, more moderate
>>> "commodity" hardware can be added progressively. Any
>>> comments/suggestions on this strategy?
>>> * Where can I read up on or checkout samples RDBMS schemas converted
>>> to HBase schema? Basically, I want to read up efficient schema design
>>> for different cardinal relationships between objects.
>>>
>>> Thank you,
>>>
>>> --
>>> Imran M Yousuf
>>> Entrepreneur & Software Engineer
>>> Smart IT Engineering
>>> Dhaka, Bangladesh
>>> Email: imran@smartitengineering.com
>>> Blog: http://imyousuf-tech.blogs.smartitengineering.com/
>>> Mobile: +880-1711402557
>>>
>>
>



-- 
Imran M Yousuf
Entrepreneur & Software Engineer
Smart IT Engineering
Dhaka, Bangladesh
Email: imran@smartitengineering.com
Blog: http://imyousuf-tech.blogs.smartitengineering.com/
Mobile: +880-1711402557

Mime
View raw message