Mailing-List: contact user-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hbase.apache.org
Received-SPF: pass (athena.apache.org: domain of michael_segel@hotmail.com
 designates 65.55.111.104 as permitted sender)
Message-ID: <BLU0-SMTP24475B61E77CCBAB85973E18FB30@phx.gbl>
Content-Type: text/plain; charset="iso-8859-1"
MIME-Version: 1.0 (Mac OS X Mail 6.3 \(1503\))
Subject: Re: HBase and Datawarehouse
From: Michael Segel <michael_segel@hotmail.com>
In-Reply-To: 
 <CA+RK=_D+buo6QQ-9yWeFA3Mys5x32Xx9fgZyj9sv3UFUjgkzOA@mail.gmail.com>
Date: Tue, 30 Apr 2013 13:14:57 -0500
Content-Transfer-Encoding: quoted-printable
References: <1367118771186-4043172.post@n3.nabble.com>
 <517D042A.9030800@yahoo.de>
 <CAAXmExVNDJBb-NtNBGNEODpjHRzec7Cs=YhUOcBi9ZiRAQLEhw@mail.gmail.com>
 <CAMVC6RMDZ3598TTTTCd7yvGQgcu07YWaUdofyQs8LEr4br8h9Q@mail.gmail.com>
 <1367206772763-4043216.post@n3.nabble.com>
 <CAF1+Vs-jeYzq7ehrbbSUwNikoh+KLeGoKUTzORyuTJ2gv8DXhw@mail.gmail.com>
 <CA+RK=_DoKfKQpZ0kvPGQFzHm56aWY7-qgteUeV1h_WzObSF4Dw@mail.gmail.com>
 <CA+r7Yvm74YTRU7Xc9_G7HD3caF-FFv0pVzzbSn+FLk+f7SM03w@mail.gmail.com>
 <CA+RK=_DevALzk6VHorCZSpWiS+4HrJmstijrt3LZo0GUewbG0w@mail.gmail.com>
 <CAGngS9f+5iZB2FVUxJxgJcjYsFuvYF2AN5e1OG6PeUnaTFXScw@mail.gmail.com>
 <CA+RK=_D+buo6QQ-9yWeFA3Mys5x32Xx9fgZyj9sv3UFUjgkzOA@mail.gmail.com>
To: user@hbase.apache.org

Multiple RS per host?=20
Huh?=20

That seems very counter intuitive and potentially problematic w M/R =
jobs.=20
Could you expand on this?=20

Thx

-Mike

On Apr 30, 2013, at 12:38 PM, Andrew Purtell <apurtell@apache.org> =
wrote:

> Rules of thumb for starting off safely and for easing support issues =
are
> really good to have, but there are no hard barriers or singular =
approaches:
> use Java 7 + G1GC, disable HBase blockcache in lieu of OS blockcache, =
run
> multiple regionservers per host. It is going to depend on how the =
cluster
> is used and loaded. If we are talking about coprocessors, then =
effective
> limits are less clear, using a coprocessor to integrate an external =
process
> implemented with native code communicating over memory mapped files in
> /dev/shm isn't outside what is possible (strawman alert).
>=20
>=20
> On Tue, Apr 30, 2013 at 5:01 AM, Kevin O'dell =
<kevin.odell@cloudera.com>wrote:
>=20
>> Asaf,
>>=20
>>  The heap barrier is something of a legend :)  You can ask 10 =
different
>> HBase committers what they think the max heap is and get 10 different
>> answers.  This is my take on heap sizes from the many clusters I have =
dealt
>> with:
>>=20
>> 8GB -> Standard heap size, and tends to run fine without any tuning
>>=20
>> 12GB -> Needs some TLC with regards to JVM tuning if your workload =
tends
>> cause churn(usually blockcache)
>>=20
>> 16GB -> GC tuning is a must, and now we need to start looking into =
MSLab
>> and ZK timeouts
>>=20
>> 20GB -> Same as 16GB in regards to tuning, but we tend to need to =
raise the
>> ZK timeout a little higher
>>=20
>> 32GB -> We do have a couple people running this high, but the pain =
out
>> weighs the gains(IMHO)
>>=20
>> 64GB -> Let me know how it goes :)
>>=20
>>=20
>>=20
>>=20
>> On Tue, Apr 30, 2013 at 4:07 AM, Andrew Purtell <apurtell@apache.org>
>> wrote:
>>=20
>>> I don't wish to be rude, but you are making odd claims as fact as
>>> "mentioned in a couple of posts". It will be difficult to have a =
serious
>>> conversation. I encourage you to test your hypotheses and let us =
know if
>> in
>>> fact there is a JVM "heap barrier" (and where it may be).
>>>=20
>>> On Monday, April 29, 2013, Asaf Mesika wrote:
>>>=20
>>>> I think for Pheoenix truly to succeed, it's need HBase to break the =
JVM
>>>> Heap barrier of 12G as I saw mentioned in couple of posts. since =
Lots
>> of
>>>> analytics queries utilize memory, thus since its memory is shared =
with
>>>> HBase, there's so much you can do on 12GB heap. On the other hand, =
if
>>>> Pheonix was implemented outside HBase on the same machine (like =
Drill
>> or
>>>> Impala is doing), you can have 60GB for this process, running many =
OLAP
>>>> queries in parallel, utilizing the same data set.
>>>>=20
>>>>=20
>>>>=20
>>>> On Mon, Apr 29, 2013 at 9:08 PM, Andrew Purtell =
<apurtell@apache.org
>>> <javascript:;>>
>>>> wrote:
>>>>=20
>>>>>> HBase is not really intended for heavy data crunching
>>>>>=20
>>>>> Yes it is. This is why we have first class MapReduce integration =
and
>>>>> optimized scanners.
>>>>>=20
>>>>> Recent versions, like 0.94, also do pretty well with the 'O' part =
of
>>>> OLAP.
>>>>>=20
>>>>> Urban Airship's Datacube is an example of a successful OLAP =
project
>>>>> implemented on HBase: http://github.com/urbanairship/datacube
>>>>>=20
>>>>> "Urban Airship uses the datacube project to support its analytics
>> stack
>>>> for
>>>>> mobile apps. We handle about ~10K events per second per node."
>>>>>=20
>>>>>=20
>>>>> Also there is Adobe's SaasBase:
>>>>> http://www.slideshare.net/clehene/hbase-and-hadoop-at-adobe
>>>>>=20
>>>>> Etc.
>>>>>=20
>>>>> Where an HBase OLAP application will differ tremendously from a
>>>> traditional
>>>>> data warehouse is of course in the interface to the datastore. You
>> have
>>>> to
>>>>> design and speak in the language of the HBase API, though Phoenix =
(
>>>>> https://github.com/forcedotcom/phoenix) is changing that.
>>>>>=20
>>>>>=20
>>>>> On Sun, Apr 28, 2013 at 10:21 PM, anil gupta =
<anilgupta84@gmail.com
>>> <javascript:;>
>>>>>=20
>>>>> wrote:
>>>>>=20
>>>>>> Hi Kiran,
>>>>>>=20
>>>>>> In HBase the data is denormalized but at the core HBase is =
KeyValue
>>>> based
>>>>>> database meant for lookups or queries that expect response in
>>>>> milliseconds.
>>>>>> OLAP i.e. data warehouse usually involves heavy data crunching.
>> HBase
>>>> is
>>>>>> not really intended for heavy data crunching. If you want to just
>>> store
>>>>>> denoramlized data and do simple queries then HBase is good. For
>> OLAP
>>>> kind
>>>>>> of stuff, you can make HBase work but IMO you will be better off
>>> using
>>>>> Hive
>>>>>> for  data warehousing.
>>>>>>=20
>>>>>> HTH,
>>>>>> Anil Gupta
>>>>>>=20
>>>>>>=20
>>>>>> On Sun, Apr 28, 2013 at 8:39 PM, Kiran <kiranvk2011@gmail.com
>>> <javascript:;>>
>>>> wrote:
>>>>>>=20
>>>>>>> But in HBase data can be said to be in  denormalised state as =
the
>>>>>>> methodology
>>>>>>> used for storage is a (column family:column) based flexible
>> schema
>>>>> .Also,
>>>>>>> from Google's  big table paper it is evident that HBase is
>> capable
>>> of
>>>>>> doing
>>>>>>> OLAP.SO where does the difference lie?
>>>>>>>=20
>>>>>>>=20
>>>>>>>=20
>>>>>>> --
>>>>>>> View this message in context:
>>>>>>>=20
>>>>>>=20
>>>>>=20
>>>>=20
>>>=20
>> =
http://apache-hbase.679495.n3.nabble.com/HBase-and-Datawarehouse-tp4043172=
p4043216.html
>>>>>>> Sent from the HBase User mailing list archive at Nabble.com.
>>>>>>>=20
>>>>>>=20
>>>>>=20
>>>>> --
>>>>> Best regards,
>>>>>=20
>>>>>   - Andy
>>>>>=20
>>>>> Problems worthy of attack prove their worth by hitting back. - =
Piet
>>> Hein
>>>>> (via Tom White)
>>>>>=20
>>>>=20
>>>=20
>>>=20
>>> --
>>> Best regards,
>>>=20
>>>   - Andy
>>>=20
>>> Problems worthy of attack prove their worth by hitting back. - Piet =
Hein
>>> (via Tom White)
>>>=20
>>=20
>>=20
>>=20
>> --
>> Kevin O'Dell
>> Systems Engineer, Cloudera
>>=20
>=20
>=20
>=20
> --=20
> Best regards,
>=20
>   - Andy
>=20
> Problems worthy of attack prove their worth by hitting back. - Piet =
Hein
> (via Tom White)