hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mich Talebzadeh <mich.talebza...@gmail.com>
Subject Re: Hive metadata on Hbase
Date Mon, 24 Oct 2016 07:56:27 GMT
Hi Furcy,

Thanks for updates.

transactional tables creates issue for us. When many updates are done they
create many delta files that require compaction.

This by itself is not an issue for Hive. However, Spark fails to read these
delta files so the job crashes.

Regards,

Mich

Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 24 October 2016 at 08:39, Furcy Pin <furcy.pin@flaminem.com> wrote:

> Hi Mich,
>
> the umbrella JIRA for this gives a few reason.
> https://issues.apache.org/jira/browse/HIVE-9452
> (with even more details in the attached pdf https://issues.apache.org/
> jira/secure/attachment/12697601/HBaseMetastoreApproach.pdf)
>
> In my experience, Hive tables with a lot of partitions (> 10 000) may
> become really slow, especially with Spark.
> The latency induced by the metastore can be really big compared to the
> whole duration of the query itself,
> because the driver needs to fetch a lot of info about partitions just to
> optimize the query, before even running it.
>
> I guess another advantage is that using a RDBMS as metastore makes it a
> SPOF, unless you setup replication etc. while, HBase would give HA for free.
>
>
>
> On Mon, Oct 24, 2016 at 9:06 AM, Mich Talebzadeh <
> mich.talebzadeh@gmail.com> wrote:
>
>> @Per
>>
>> We run full transactional enabled Hive metadb on an Oracle DB.
>>
>> I don't have statistics now but will collect from AWR reports no problem.
>>
>> @Jorn,
>>
>> The primary reason Oracle was chosen is because the company has global
>> licenses for Oracle + MSSQL + SAP and they are classified as Enterprise
>> Grade databases.
>>
>> None of MySQL and others are classified as such so they cannot be
>> deployed in production.
>>
>> Besides, for us to have Hive metadata on Oracle makes sense as our
>> infrastructure does all the support, HA etc for it and they have trained
>> DBAs to look after it 24x7.
>>
>> Admittedly we are now relying on HDFS itself plus Hbase as well for
>> persistent storage. So the situation might change.
>>
>> HTH
>>
>>
>>
>>
>>
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>> On 24 October 2016 at 06:46, Per Ullberg <per.ullberg@klarna.com> wrote:
>>
>>> I thought the main gain was to get ACID on Hive performant enough.
>>>
>>> @Mich: Do you run with ACID-enabled tables? How many
>>> Create/Update/Deletes do you do per second?
>>>
>>> best regards
>>> /Pelle
>>>
>>> On Mon, Oct 24, 2016 at 7:39 AM, Jörn Franke <jornfranke@gmail.com>
>>> wrote:
>>>
>>>> I think the main gain is more about getting rid of a dedicated database
>>>> including maintenance and potential license cost.
>>>> For really large clusters and a lot of users this might be even more
>>>> beneficial. You can avoid clustering the database etc.
>>>>
>>>> On 24 Oct 2016, at 00:46, Mich Talebzadeh <mich.talebzadeh@gmail.com>
>>>> wrote:
>>>>
>>>>
>>>> A while back there was some notes on having Hive metastore on Hbase as
>>>> opposed to conventional RDBMSs
>>>>
>>>> I am currently involved with some hefty work with Hbase and Phoenix for
>>>> batch ingestion of trade data. As long as you define your Hbase table
>>>> through Phoenix and with secondary Phoenix indexes on Hbase, the speed is
>>>> impressive.
>>>>
>>>> I am not sure how much having Hbase as Hive metastore is going to add
>>>> to Hive performance. We use Oracle 12c as Hive metastore and the Hive
>>>> database/schema is built on solid state disks. Never had any issues with
>>>> lock and concurrency.
>>>>
>>>> Therefore I am not sure what one is going to gain by having Hbase as
>>>> the Hive metastore? I trust that we can still use our existing schemas on
>>>> Oracle.
>>>>
>>>> HTH
>>>>
>>>>
>>>>
>>>> Dr Mich Talebzadeh
>>>>
>>>>
>>>>
>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>
>>>>
>>>>
>>>> http://talebzadehmich.wordpress.com
>>>>
>>>>
>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>>> any loss, damage or destruction of data or any other property which may
>>>> arise from relying on this email's technical content is explicitly
>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>> arising from such loss, damage or destruction.
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>>
>>> *Per Ullberg*
>>> Data Vault Tech Lead
>>> Odin Uppsala
>>> +46 701612693 <+46+701612693>
>>>
>>> Klarna AB (publ)
>>> Sveavägen 46, 111 34 Stockholm
>>> Tel: +46 8 120 120 00 <+46812012000>
>>> Reg no: 556737-0431
>>> klarna.com
>>>
>>>
>>
>

Mime
View raw message