hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jörn Franke <jornfra...@gmail.com>
Subject Re: Hive alternatives?
Date Thu, 05 Nov 2015 19:49:20 GMT
First it depends on what you want to do exactly. Second, Hive > 1.2, Tez as an Execution
Engine (I recommend >= 0.8) and Orc as storage format can be pretty quick depending on
your use case. Additionally you may want to employ compression which is a performance boost
once you understand how storage indexes and bloom filter work. Additionally , you need to
think about how you sort the data. Cf. also
https://snippetessay.wordpress.com/2015/07/25/hive-optimizations-with-indexes-bloom-filters-and-statistics/

However, you have to rethink how you define your technical data model. A lot of prejoinend
data in a big flat table can be more performant when using storage indexes and bloom filters
than using standard indexes and dimensional modeling.

Besides besides tez you can also use other execution engine in your session (eg Spark) if
this makes sense.

Finally you have to review how yarn manages resources including preemption, fair vs capacity
scheduler etc.

Btw the same holds also for relational database appliances, such as Exadata. The standard
approach dimensional modeling + standard indexes there is often not anymore the most performant.




> On 05 Nov 2015, at 20:04, Andrés Ivaldi <iaivaldi@gmail.com> wrote:
> 
> Hello, 
> I was looking for Hive as OLAP alternative, but I've read that is quite slow for that,
does anybody have experiences about? or a Hive altenative for OLAP? Killin is not an option
becouse we need dynamic OLAP like ROLAP
> 
> Regards,
> 
> -- 
> Ing. Ivaldi Andres

Mime
View raw message