Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 23B54DD55 for ; Thu, 25 Oct 2012 21:10:59 +0000 (UTC) Received: (qmail 34038 invoked by uid 500); 25 Oct 2012 21:10:54 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 33921 invoked by uid 500); 25 Oct 2012 21:10:54 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 33913 invoked by uid 99); 25 Oct 2012 21:10:54 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 25 Oct 2012 21:10:54 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of stevel@hortonworks.com designates 209.85.216.176 as permitted sender) Received: from [209.85.216.176] (HELO mail-qc0-f176.google.com) (209.85.216.176) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 25 Oct 2012 21:10:48 +0000 Received: by mail-qc0-f176.google.com with SMTP id n41so971002qco.35 for ; Thu, 25 Oct 2012 14:10:27 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:x-gm-message-state; bh=3LqFDtLAt2nR44vH5QAm/NRpPLSYzQDL+7ca7dzf//c=; b=lwztG5Jhkt9ELSNVR/9w5Ut9DXb0JOBJlqkNhf4nyF2jtb3FJUrgekkfvI0zV6IMCB KmbtpB2cp4IXxGCHbiyHAXwHVdJxDCwSIYL2YXPPi8V/qLMdn69wkMrfJqA9CMYNTR2J hGPXebM0JMFh+UFQYbTBmAQZnkJwz4Ex2wI6cDU6BrjT0xMVBH3buCTuVNX2Bz8lquP5 SWpTcxfLNH88ICP+E3RguQmPISejYUK1mtzr9bYlDZW5BnorsG4iQsFMHwRFqWG+4L3A KCZc5qMmT4hbK7RrX3BzDLxTr/eHroABW2kdOu3IBymTXotmmwi+CuPiJXnxQ5aHJuE8 7gqA== MIME-Version: 1.0 Received: by 10.49.14.193 with SMTP id r1mr11879000qec.50.1351199427141; Thu, 25 Oct 2012 14:10:27 -0700 (PDT) Received: by 10.49.38.193 with HTTP; Thu, 25 Oct 2012 14:10:27 -0700 (PDT) In-Reply-To: <1351193060.2547.15.camel@merkur> References: <1351193060.2547.15.camel@merkur> Date: Thu, 25 Oct 2012 22:10:27 +0100 Message-ID: Subject: Re: reference architecture From: Steve Loughran To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=047d7bdc12fce0201304cce8a188 X-Gm-Message-State: ALoCoQmmTPzlc01rnlyWicM+ns/OgvyxHrM7e/5p0Jzh2Kjzu3jOSQTQz63TAHB6s5U1pMMoz9PM X-Virus-Checked: Checked by ClamAV on apache.org --047d7bdc12fce0201304cce8a188 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On 25 October 2012 20:24, Daniel K=C3=A4fer wro= te: > Hello all, > > I'm looking for a reference architecture for hadoop. The only result I > found is Lambda architecture from Nathan Marz[0]. > I quite like the new Hadoop in Practice for a lot of that, especially the answer to #2, "how to store the data", where he looks at all the options. Joining is the other big issue. http://steveloughran.blogspot.co.uk/2012/10/hadoop-in-practice-applied-hado= op.html Regarding storing DB data, HBase-on-HDFS is where people keep it; Pig and Hive can work with that as well as rawer data kept in HDFS directly > With architecture I mean answers to question like: > - How should I store the data? CSV, Thirft, ProtoBuf > - How should I model the data? ER-Model, Starschema, something new? > - normalized or denormalized or both (master data normalized, then > transformation to denormalized, like ETL) > - How should i combine database and HDFS-Files? > > Are there any other documented architectures for hadoop? > > Regards > Daniel K=C3=A4fer > > > [0] http://www.manning.com/marz/ just a preprint yet, not completed > > --047d7bdc12fce0201304cce8a188 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable

On 25 October 2012 20:24, Daniel K=C3=A4= fer <d.kaefer@hs-furtwangen.de> wrote:
Hello all,

I'm looking for a reference architecture for hadoop. The only result I<= br> found is Lambda architecture from Nathan Marz[0].

=
I quite like the new Hadoop in=C2=A0Practice=C2=A0for a lot of t= hat, especially the answer to #2, "how to store the data", where = he looks at all the options. Joining is the other big issue.

=C2=A0=C2=A0<= /div>
Regarding storing DB data, HBase-on-HDFS is where people keep it; Pig and H= ive can work with that as well as rawer data kept in HDFS directly


With architecture I mean answers to question like:
- How should I store the data? CSV, Thirft, ProtoBuf
- How should I model the data? ER-Model, Starschema, something new?
- normalized or denormalized or both (master data normalized, then
transformation to denormalized, like ETL)
- How should i combine database and HDFS-Files?

Are there any other documented architectures for hadoop?

Regards
Daniel K=C3=A4fer


[0] http://www.m= anning.com/marz/ just a preprint yet, not completed


--047d7bdc12fce0201304cce8a188--