hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mich Talebzadeh <>
Subject Re: Using Spark on Hive with Hive also using Spark as its execution engine
Date Tue, 12 Jul 2016 14:59:34 GMT
Thanks Alan. Point taken.

In mitigation, here are members in Spark forum who have shown (interest) in
using Hive directly and I quote one:

"Did you have any benchmark for using Spark as backend engine for Hive vs
using Spark thrift server (and run spark code for hive queries)? We are
using later but it will be very useful to remove thriftserver, if we can. "



Dr Mich Talebzadeh

LinkedIn *

*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.

On 12 July 2016 at 15:39, Alan Gates <> wrote:

> > On Jul 11, 2016, at 16:22, Mich Talebzadeh <>
> wrote:
> >
> > <snip>
> >       • If I add LLAP, will that be more efficient in terms of memory
> usage compared to Hive or not? Will it keep the data in memory for reuse or
> not.
> >
> Yes, this is exactly what LLAP does.  It keeps a cache of hot data (hot
> columns of hot partitions) and shares that across queries.  Unlike many MPP
> caches it will cache the same data on multiple nodes if it has more workers
> that want to access the data than can be run on a single node.
> As a side note, it is considered bad form in Apache to send a message to
> two lists.  It causes a lot of background noise for people on the Spark
> list who probably aren’t interested in Hive performance.
> Alan.

View raw message