hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Käfer <d.kae...@hs-furtwangen.de>
Subject Re: reference architecture
Date Thu, 25 Oct 2012 22:17:18 GMT
Am Donnerstag, den 25.10.2012, 22:10 +0100 schrieb Steve Loughran:
> I quite like the new Hadoop in Practice for a lot of that, especially
> the answer to #2, "how to store the data", where he looks at all the
> options

The Part 3 Big Data Patterns looks very interesting. I am going to read
the book.

Am Donnerstag, den 25.10.2012, 22:10 +0100 schrieb Steve Loughran:
> Regarding storing DB data, HBase-on-HDFS is where people keep it; Pig
> and Hive can work with that as well as rawer data kept in HDFS
> directly

But is that the best idea? HBase is great for random read and small
range scan. But the Hive (SQL) performance is 4-5x slower than plain
HDFS. [0]

I guess first data (raw data) in HDFS and last data in HBase is a good
idea. But how to store the data between individual mapreduce jobs?

[0] Todd Lipcon
p.19 I don't benchmark the performance myself.

View raw message