Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A47CEF442 for ; Tue, 30 Apr 2013 18:15:27 +0000 (UTC) Received: (qmail 57653 invoked by uid 500); 30 Apr 2013 18:15:25 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 57598 invoked by uid 500); 30 Apr 2013 18:15:25 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 57590 invoked by uid 99); 30 Apr 2013 18:15:25 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 30 Apr 2013 18:15:25 +0000 X-ASF-Spam-Status: No, hits=2.3 required=5.0 tests=FREEMAIL_REPLY,RCVD_IN_DNSWL_NONE,SPF_PASS,URI_HEX X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of michael_segel@hotmail.com designates 65.55.111.104 as permitted sender) Received: from [65.55.111.104] (HELO blu0-omc2-s29.blu0.hotmail.com) (65.55.111.104) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 30 Apr 2013 18:15:21 +0000 Received: from BLU0-SMTP244 ([65.55.111.72]) by blu0-omc2-s29.blu0.hotmail.com with Microsoft SMTPSVC(6.0.3790.4675); Tue, 30 Apr 2013 11:15:00 -0700 X-EIP: [Huk8csRgkp8+pm/t5RuaBjk/4IFsd8TC] X-Originating-Email: [michael_segel@hotmail.com] Message-ID: Received: from [172.25.48.228] ([173.252.71.3]) by BLU0-SMTP244.phx.gbl over TLS secured channel with Microsoft SMTPSVC(6.0.3790.4675); Tue, 30 Apr 2013 11:14:58 -0700 Content-Type: text/plain; charset="iso-8859-1" MIME-Version: 1.0 (Mac OS X Mail 6.3 \(1503\)) Subject: Re: HBase and Datawarehouse From: Michael Segel In-Reply-To: Date: Tue, 30 Apr 2013 13:14:57 -0500 Content-Transfer-Encoding: quoted-printable References: <1367118771186-4043172.post@n3.nabble.com> <517D042A.9030800@yahoo.de> <1367206772763-4043216.post@n3.nabble.com> To: user@hbase.apache.org X-Mailer: Apple Mail (2.1503) X-OriginalArrivalTime: 30 Apr 2013 18:14:58.0663 (UTC) FILETIME=[9FC8A770:01CE45CE] X-Virus-Checked: Checked by ClamAV on apache.org Multiple RS per host?=20 Huh?=20 That seems very counter intuitive and potentially problematic w M/R = jobs.=20 Could you expand on this?=20 Thx -Mike On Apr 30, 2013, at 12:38 PM, Andrew Purtell = wrote: > Rules of thumb for starting off safely and for easing support issues = are > really good to have, but there are no hard barriers or singular = approaches: > use Java 7 + G1GC, disable HBase blockcache in lieu of OS blockcache, = run > multiple regionservers per host. It is going to depend on how the = cluster > is used and loaded. If we are talking about coprocessors, then = effective > limits are less clear, using a coprocessor to integrate an external = process > implemented with native code communicating over memory mapped files in > /dev/shm isn't outside what is possible (strawman alert). >=20 >=20 > On Tue, Apr 30, 2013 at 5:01 AM, Kevin O'dell = wrote: >=20 >> Asaf, >>=20 >> The heap barrier is something of a legend :) You can ask 10 = different >> HBase committers what they think the max heap is and get 10 different >> answers. This is my take on heap sizes from the many clusters I have = dealt >> with: >>=20 >> 8GB -> Standard heap size, and tends to run fine without any tuning >>=20 >> 12GB -> Needs some TLC with regards to JVM tuning if your workload = tends >> cause churn(usually blockcache) >>=20 >> 16GB -> GC tuning is a must, and now we need to start looking into = MSLab >> and ZK timeouts >>=20 >> 20GB -> Same as 16GB in regards to tuning, but we tend to need to = raise the >> ZK timeout a little higher >>=20 >> 32GB -> We do have a couple people running this high, but the pain = out >> weighs the gains(IMHO) >>=20 >> 64GB -> Let me know how it goes :) >>=20 >>=20 >>=20 >>=20 >> On Tue, Apr 30, 2013 at 4:07 AM, Andrew Purtell >> wrote: >>=20 >>> I don't wish to be rude, but you are making odd claims as fact as >>> "mentioned in a couple of posts". It will be difficult to have a = serious >>> conversation. I encourage you to test your hypotheses and let us = know if >> in >>> fact there is a JVM "heap barrier" (and where it may be). >>>=20 >>> On Monday, April 29, 2013, Asaf Mesika wrote: >>>=20 >>>> I think for Pheoenix truly to succeed, it's need HBase to break the = JVM >>>> Heap barrier of 12G as I saw mentioned in couple of posts. since = Lots >> of >>>> analytics queries utilize memory, thus since its memory is shared = with >>>> HBase, there's so much you can do on 12GB heap. On the other hand, = if >>>> Pheonix was implemented outside HBase on the same machine (like = Drill >> or >>>> Impala is doing), you can have 60GB for this process, running many = OLAP >>>> queries in parallel, utilizing the same data set. >>>>=20 >>>>=20 >>>>=20 >>>> On Mon, Apr 29, 2013 at 9:08 PM, Andrew Purtell = >> > >>>> wrote: >>>>=20 >>>>>> HBase is not really intended for heavy data crunching >>>>>=20 >>>>> Yes it is. This is why we have first class MapReduce integration = and >>>>> optimized scanners. >>>>>=20 >>>>> Recent versions, like 0.94, also do pretty well with the 'O' part = of >>>> OLAP. >>>>>=20 >>>>> Urban Airship's Datacube is an example of a successful OLAP = project >>>>> implemented on HBase: http://github.com/urbanairship/datacube >>>>>=20 >>>>> "Urban Airship uses the datacube project to support its analytics >> stack >>>> for >>>>> mobile apps. We handle about ~10K events per second per node." >>>>>=20 >>>>>=20 >>>>> Also there is Adobe's SaasBase: >>>>> http://www.slideshare.net/clehene/hbase-and-hadoop-at-adobe >>>>>=20 >>>>> Etc. >>>>>=20 >>>>> Where an HBase OLAP application will differ tremendously from a >>>> traditional >>>>> data warehouse is of course in the interface to the datastore. You >> have >>>> to >>>>> design and speak in the language of the HBase API, though Phoenix = ( >>>>> https://github.com/forcedotcom/phoenix) is changing that. >>>>>=20 >>>>>=20 >>>>> On Sun, Apr 28, 2013 at 10:21 PM, anil gupta = >> >>>>>=20 >>>>> wrote: >>>>>=20 >>>>>> Hi Kiran, >>>>>>=20 >>>>>> In HBase the data is denormalized but at the core HBase is = KeyValue >>>> based >>>>>> database meant for lookups or queries that expect response in >>>>> milliseconds. >>>>>> OLAP i.e. data warehouse usually involves heavy data crunching. >> HBase >>>> is >>>>>> not really intended for heavy data crunching. If you want to just >>> store >>>>>> denoramlized data and do simple queries then HBase is good. For >> OLAP >>>> kind >>>>>> of stuff, you can make HBase work but IMO you will be better off >>> using >>>>> Hive >>>>>> for data warehousing. >>>>>>=20 >>>>>> HTH, >>>>>> Anil Gupta >>>>>>=20 >>>>>>=20 >>>>>> On Sun, Apr 28, 2013 at 8:39 PM, Kiran >> > >>>> wrote: >>>>>>=20 >>>>>>> But in HBase data can be said to be in denormalised state as = the >>>>>>> methodology >>>>>>> used for storage is a (column family:column) based flexible >> schema >>>>> .Also, >>>>>>> from Google's big table paper it is evident that HBase is >> capable >>> of >>>>>> doing >>>>>>> OLAP.SO where does the difference lie? >>>>>>>=20 >>>>>>>=20 >>>>>>>=20 >>>>>>> -- >>>>>>> View this message in context: >>>>>>>=20 >>>>>>=20 >>>>>=20 >>>>=20 >>>=20 >> = http://apache-hbase.679495.n3.nabble.com/HBase-and-Datawarehouse-tp4043172= p4043216.html >>>>>>> Sent from the HBase User mailing list archive at Nabble.com. >>>>>>>=20 >>>>>>=20 >>>>>=20 >>>>> -- >>>>> Best regards, >>>>>=20 >>>>> - Andy >>>>>=20 >>>>> Problems worthy of attack prove their worth by hitting back. - = Piet >>> Hein >>>>> (via Tom White) >>>>>=20 >>>>=20 >>>=20 >>>=20 >>> -- >>> Best regards, >>>=20 >>> - Andy >>>=20 >>> Problems worthy of attack prove their worth by hitting back. - Piet = Hein >>> (via Tom White) >>>=20 >>=20 >>=20 >>=20 >> -- >> Kevin O'Dell >> Systems Engineer, Cloudera >>=20 >=20 >=20 >=20 > --=20 > Best regards, >=20 > - Andy >=20 > Problems worthy of attack prove their worth by hitting back. - Piet = Hein > (via Tom White)