Return-Path: Delivered-To: apmail-hadoop-hbase-user-archive@minotaur.apache.org Received: (qmail 10951 invoked from network); 13 Nov 2009 02:04:36 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 13 Nov 2009 02:04:36 -0000 Received: (qmail 34485 invoked by uid 500); 13 Nov 2009 02:04:36 -0000 Delivered-To: apmail-hadoop-hbase-user-archive@hadoop.apache.org Received: (qmail 34415 invoked by uid 500); 13 Nov 2009 02:04:36 -0000 Mailing-List: contact hbase-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hbase-user@hadoop.apache.org Delivered-To: mailing list hbase-user@hadoop.apache.org Received: (qmail 34405 invoked by uid 99); 13 Nov 2009 02:04:36 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 13 Nov 2009 02:04:36 +0000 X-ASF-Spam-Status: No, hits=-3.6 required=5.0 tests=AWL,BAYES_00 X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of imyousuf@gmail.com designates 209.85.212.202 as permitted sender) Received: from [209.85.212.202] (HELO mail-vw0-f202.google.com) (209.85.212.202) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 13 Nov 2009 02:04:32 +0000 Received: by vws40 with SMTP id 40so958782vws.5 for ; Thu, 12 Nov 2009 18:04:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=G/RqLeTYb/nMlSD7XuD5gaBsbSm/Aow9pq7yI2b9tzI=; b=CVQFWvx2CbFqFgVZD/s/Z5RSKRsGPFOaDqbN96FElrqospxquV0z1gNOSbtCC+sWCx NBCkCo/VInh4GgrPe5nFWCVL/jJgHA2EIVXqgrlSjl6w67xcwE15tQTKoMYxECSsFJVH URbye59mW04FOw54EvPOvgBVaYLfpHgNcceD0= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=UT5+7BAgAPc3hPwM1nqiYCwSEvJL12LsagyGL5Sfcl+GrR7xnc9dSeAMJexOHSrnRs hsWnIk0AuYA4PQDuS++EhQ6OPwNCD50K0tBsCrD4VntealcN5NdUUIA4aOtvdXSj1vMz tQSiIUHb7wBhbctbl70hGbj5pXyiOBWDxgAo4= MIME-Version: 1.0 Received: by 10.220.122.212 with SMTP id m20mr4528988vcr.32.1258077851010; Thu, 12 Nov 2009 18:04:11 -0800 (PST) In-Reply-To: <78568af10911121800i12b8943x72743001c8078386@mail.gmail.com> References: <7bfdc29a0911120613g689fffcdvdd80fe1a1c84231f@mail.gmail.com> <7bfdc29a0911121758s5f3da9c5of010e7155dc7d89@mail.gmail.com> <78568af10911121800i12b8943x72743001c8078386@mail.gmail.com> Date: Fri, 13 Nov 2009 09:04:10 +0700 Message-ID: <7bfdc29a0911121804i18c8c690qbaa0456ce34806d1@mail.gmail.com> Subject: Re: newbie: need help on understanding HBase From: Imran M Yousuf To: hbase-user@hadoop.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable On Fri, Nov 13, 2009 at 9:00 AM, Ryan Rawson wrote: > HBase does at least 3 things that traditional databases have a hard time = with: > > - Large blobs of data. Mysql is particularly guilty of not handling this = well. > - Tables that grow to be larger than reasonably priced single machines. > - Write loads that are not compatible with master-slave replication > > The 2nd and 3rd are very interesting, since you either have to pay for > something like Oracle RAC, or start sharding. > Exactly, and since contents will be blob data and my experience with RDBMS blob suggests that scaling is proportional to *BIG* money so I am eager to take the HBase path. I was actually praying and hoping you join this thread :). Can you please elaborate Column Family, Column and Cell and their basic use cases? Thanks a lot, Imran > On Thu, Nov 12, 2009 at 5:58 PM, Imran M Yousuf wrot= e: >> On Thu, Nov 12, 2009 at 10:50 PM, Chris Bates >> wrote: >>> Hi Imran, >>> >>> I'm a new user as well. =A0I found these presentations helpful in answe= ring >>> most of your questions: >>> http://wiki.apache.org/hadoop/HBase/HBasePresentations >>> >>> There are HBase schema designs in there. >>> >> >> I read them, but without the speakers explanation the schema parts >> remain unexplained for a dumb newbie like me. I was looking for more >> concrete definitions of column family, column, cell etc. and their use >> cases. I guess I will have to learn them by experimenting. >> >>> You might also want to read the original BigTable paper and the chapter= on >>> HBase in OReilly's Hadoop book. >>> >>> But to answer one of your questions--"Big Data" usually refers to a dat= aset >>> that is millions to billions in length. =A0But "Big Data" doesn't mean = you >>> have to use a tool like HBase. =A0We have some MySQL tables that are 10= 0 >>> million rows and work fine. =A0You have to identify what works best for= your >>> use and use the most appropriate tool. >> >> Thanks, IMHO, I am sure that HBase is more suitable than MySQL simply >> because of the complexity and cost in scaling an application with Blob >> data. >> >> Thanks a lot, >> >> Imran >> >>> >>> On Thu, Nov 12, 2009 at 9:13 AM, Imran M Yousuf wr= ote: >>> >>>> Hi! >>>> >>>> I am absolutely new to HBase. All I have done is to read up >>>> documentation, presentation and getting a single instance up and >>>> running. I am starting on a Content Management System which will be >>>> used as a backend for multiple web applications of different natures. >>>> In the CMS: >>>> * User can define their content known as content type. >>>> * Content can have =A0one-2-many one-2-one and many-2-many relationshi= p >>>> with other contents. >>>> * Content fields should be versioned >>>> * Content type can change in runtime, i.e. fields (a.k.a. columns in >>>> HBase) added and removal will not be allowed just yet. >>>> * Every content type will have a corresponding grammer to validate >>>> content of its type. >>>> * It will have authentication and authorization >>>> * It will have full text search based on Lucene/Katta. >>>> >>>> Based on these requirements I have the following questions that I >>>> would like feedback on: >>>> * Reading articles and presentations it looks to be HBase is a perfect >>>> match as it supports multi-dimensional rows, versioned cells, dynamic >>>> schema modification. But I could not understand what is the definition >>>> of "Big Data" - that is if a content size is roughly 1~100kB >>>> (field/cell size 0~100kB), is HBase meant for such uses? >>>> * Since I am not sure how much load the site will have, I am planning >>>> to setup DN+RS on Rackspace cloud instances with 2GB/80GB HDD with a >>>> view of with revenue and pageviews increasing, more moderate >>>> "commodity" hardware can be added progressively. Any >>>> comments/suggestions on this strategy? >>>> * Where can I read up on or checkout samples RDBMS schemas converted >>>> to HBase schema? Basically, I want to read up efficient schema design >>>> for different cardinal relationships between objects. >>>> >>>> Thank you, >>>> >>>> -- >>>> Imran M Yousuf >>>> Entrepreneur & Software Engineer >>>> Smart IT Engineering >>>> Dhaka, Bangladesh >>>> Email: imran@smartitengineering.com >>>> Blog: http://imyousuf-tech.blogs.smartitengineering.com/ >>>> Mobile: +880-1711402557 >>>> >>> >> >> >> >> -- >> Imran M Yousuf >> Entrepreneur & Software Engineer >> Smart IT Engineering >> Dhaka, Bangladesh >> Email: imran@smartitengineering.com >> Blog: http://imyousuf-tech.blogs.smartitengineering.com/ >> Mobile: +880-1711402557 >> > --=20 Imran M Yousuf Entrepreneur & Software Engineer Smart IT Engineering Dhaka, Bangladesh Email: imran@smartitengineering.com Blog: http://imyousuf-tech.blogs.smartitengineering.com/ Mobile: +880-1711402557