hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Loughran <ste...@hortonworks.com>
Subject Re: reference architecture
Date Thu, 25 Oct 2012 21:10:27 GMT
On 25 October 2012 20:24, Daniel Käfer <d.kaefer@hs-furtwangen.de> wrote:

> Hello all,
> I'm looking for a reference architecture for hadoop. The only result I
> found is Lambda architecture from Nathan Marz[0].

I quite like the new Hadoop in Practice for a lot of that, especially the
answer to #2, "how to store the data", where he looks at all the options.
Joining is the other big issue.


Regarding storing DB data, HBase-on-HDFS is where people keep it; Pig and
Hive can work with that as well as rawer data kept in HDFS directly

> With architecture I mean answers to question like:
> - How should I store the data? CSV, Thirft, ProtoBuf
> - How should I model the data? ER-Model, Starschema, something new?
> - normalized or denormalized or both (master data normalized, then
> transformation to denormalized, like ETL)
> - How should i combine database and HDFS-Files?
> Are there any other documented architectures for hadoop?
> Regards
> Daniel Käfer
> [0] http://www.manning.com/marz/ just a preprint yet, not completed

View raw message