hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xuefu Zhang <xzh...@cloudera.com>
Subject Re: Hive on Spark
Date Mon, 31 Aug 2015 20:25:03 GMT
What you described isn't part of the functionality of Hive on Spark.
Rather, Spark is used here as a general purpose engine similar to MR but
without intemediate stages. It's batch origientated.

Keeping 100T data in memory is hardly beneficial unless you know that that
dataset is going to be used in subsequent queries.

For loading data in memory and providing near real-time response, you might
want to look at some memory-based DBs.

Thanks,
Xuefu

On Thu, Aug 27, 2015 at 9:11 AM, Patrick McAnneny <
patrick.mcanneny@leadkarma.com> wrote:

> Once I get "hive.execution.engine=spark" working, how would I go about
> loading portions of my data into memory? Lets say I have a 100TB database
> and want to load all of last weeks data in spark memory, is this possible
> or even beneficial? Or am I thinking about hive on spark in the wrong way.
>
> I also assume hive on spark could get me to near-real-time capabilities
> for large queries. Is this true?
>

Mime
View raw message