Mailing-List: contact hbase-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hbase-user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of imyousuf@gmail.com designates
 209.85.212.202 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:in-reply-to:references:date:message-id:subject:from:to
         :content-type:content-transfer-encoding;
        b=UT5+7BAgAPc3hPwM1nqiYCwSEvJL12LsagyGL5Sfcl+GrR7xnc9dSeAMJexOHSrnRs
         hsWnIk0AuYA4PQDuS++EhQ6OPwNCD50K0tBsCrD4VntealcN5NdUUIA4aOtvdXSj1vMz
         tQSiIUHb7wBhbctbl70hGbj5pXyiOBWDxgAo4=
MIME-Version: 1.0
In-Reply-To: <78568af10911121800i12b8943x72743001c8078386@mail.gmail.com>
References: <7bfdc29a0911120613g689fffcdvdd80fe1a1c84231f@mail.gmail.com>
	 <b05c420a0911120750s18dac6ecm828498689a3f4cd2@mail.gmail.com>
	 <7bfdc29a0911121758s5f3da9c5of010e7155dc7d89@mail.gmail.com>
	 <78568af10911121800i12b8943x72743001c8078386@mail.gmail.com>
Date: Fri, 13 Nov 2009 09:04:10 +0700
Message-ID: <7bfdc29a0911121804i18c8c690qbaa0456ce34806d1@mail.gmail.com>
Subject: Re: newbie: need help on understanding HBase
From: Imran M Yousuf <imyousuf@gmail.com>
To: hbase-user@hadoop.apache.org
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

On Fri, Nov 13, 2009 at 9:00 AM, Ryan Rawson <ryanobjc@gmail.com> wrote:
> HBase does at least 3 things that traditional databases have a hard time =
with:
>
> - Large blobs of data. Mysql is particularly guilty of not handling this =
well.
> - Tables that grow to be larger than reasonably priced single machines.
> - Write loads that are not compatible with master-slave replication
>
> The 2nd and 3rd are very interesting, since you either have to pay for
> something like Oracle RAC, or start sharding.
>

Exactly, and since contents will be blob data and my experience with
RDBMS blob suggests that scaling is proportional to *BIG* money so I
am eager to take the HBase path. I was actually praying and hoping you
join this thread :). Can you please elaborate Column Family, Column
and Cell and their basic use cases?

Thanks a lot,

Imran

> On Thu, Nov 12, 2009 at 5:58 PM, Imran M Yousuf <imyousuf@gmail.com> wrot=
e:
>> On Thu, Nov 12, 2009 at 10:50 PM, Chris Bates
>> <christopher.andrew.bates@gmail.com> wrote:
>>> Hi Imran,
>>>
>>> I'm a new user as well. =A0I found these presentations helpful in answe=
ring
>>> most of your questions:
>>> http://wiki.apache.org/hadoop/HBase/HBasePresentations
>>>
>>> There are HBase schema designs in there.
>>>
>>
>> I read them, but without the speakers explanation the schema parts
>> remain unexplained for a dumb newbie like me. I was looking for more
>> concrete definitions of column family, column, cell etc. and their use
>> cases. I guess I will have to learn them by experimenting.
>>
>>> You might also want to read the original BigTable paper and the chapter=
 on
>>> HBase in OReilly's Hadoop book.
>>>
>>> But to answer one of your questions--"Big Data" usually refers to a dat=
aset
>>> that is millions to billions in length. =A0But "Big Data" doesn't mean =
you
>>> have to use a tool like HBase. =A0We have some MySQL tables that are 10=
0
>>> million rows and work fine. =A0You have to identify what works best for=
 your
>>> use and use the most appropriate tool.
>>
>> Thanks, IMHO, I am sure that HBase is more suitable than MySQL simply
>> because of the complexity and cost in scaling an application with Blob
>> data.
>>
>> Thanks a lot,
>>
>> Imran
>>
>>>
>>> On Thu, Nov 12, 2009 at 9:13 AM, Imran M Yousuf <imyousuf@gmail.com> wr=
ote:
>>>
>>>> Hi!
>>>>
>>>> I am absolutely new to HBase. All I have done is to read up
>>>> documentation, presentation and getting a single instance up and
>>>> running. I am starting on a Content Management System which will be
>>>> used as a backend for multiple web applications of different natures.
>>>> In the CMS:
>>>> * User can define their content known as content type.
>>>> * Content can have =A0one-2-many one-2-one and many-2-many relationshi=
p
>>>> with other contents.
>>>> * Content fields should be versioned
>>>> * Content type can change in runtime, i.e. fields (a.k.a. columns in
>>>> HBase) added and removal will not be allowed just yet.
>>>> * Every content type will have a corresponding grammer to validate
>>>> content of its type.
>>>> * It will have authentication and authorization
>>>> * It will have full text search based on Lucene/Katta.
>>>>
>>>> Based on these requirements I have the following questions that I
>>>> would like feedback on:
>>>> * Reading articles and presentations it looks to be HBase is a perfect
>>>> match as it supports multi-dimensional rows, versioned cells, dynamic
>>>> schema modification. But I could not understand what is the definition
>>>> of "Big Data" - that is if a content size is roughly 1~100kB
>>>> (field/cell size 0~100kB), is HBase meant for such uses?
>>>> * Since I am not sure how much load the site will have, I am planning
>>>> to setup DN+RS on Rackspace cloud instances with 2GB/80GB HDD with a
>>>> view of with revenue and pageviews increasing, more moderate
>>>> "commodity" hardware can be added progressively. Any
>>>> comments/suggestions on this strategy?
>>>> * Where can I read up on or checkout samples RDBMS schemas converted
>>>> to HBase schema? Basically, I want to read up efficient schema design
>>>> for different cardinal relationships between objects.
>>>>
>>>> Thank you,
>>>>
>>>> --
>>>> Imran M Yousuf
>>>> Entrepreneur & Software Engineer
>>>> Smart IT Engineering
>>>> Dhaka, Bangladesh
>>>> Email: imran@smartitengineering.com
>>>> Blog: http://imyousuf-tech.blogs.smartitengineering.com/
>>>> Mobile: +880-1711402557
>>>>
>>>
>>
>>
>>
>> --
>> Imran M Yousuf
>> Entrepreneur & Software Engineer
>> Smart IT Engineering
>> Dhaka, Bangladesh
>> Email: imran@smartitengineering.com
>> Blog: http://imyousuf-tech.blogs.smartitengineering.com/
>> Mobile: +880-1711402557
>>
>


--=20
Imran M Yousuf
Entrepreneur & Software Engineer
Smart IT Engineering
Dhaka, Bangladesh
Email: imran@smartitengineering.com
Blog: http://imyousuf-tech.blogs.smartitengineering.com/
Mobile: +880-1711402557