Return-Path: X-Original-To: apmail-hadoop-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CAA1CD503 for ; Sat, 27 Oct 2012 09:19:46 +0000 (UTC) Received: (qmail 4191 invoked by uid 500); 27 Oct 2012 09:19:42 -0000 Delivered-To: apmail-hadoop-user-archive@hadoop.apache.org Received: (qmail 4082 invoked by uid 500); 27 Oct 2012 09:19:41 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 4055 invoked by uid 99); 27 Oct 2012 09:19:40 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 27 Oct 2012 09:19:40 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of russell.jurney@gmail.com designates 209.85.216.176 as permitted sender) Received: from [209.85.216.176] (HELO mail-qc0-f176.google.com) (209.85.216.176) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 27 Oct 2012 09:19:36 +0000 Received: by mail-qc0-f176.google.com with SMTP id n41so1998830qco.35 for ; Sat, 27 Oct 2012 02:19:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=references:from:in-reply-to:mime-version:date:message-id:subject:to :content-type:content-transfer-encoding; bh=sbi3C9X5tF7ZPZe+tgDf8jkvaxXGsRYdtKaPYeta6YE=; b=syz/xPtmyZQR7WeGsKkGO2Zr1RVj1iEu3gj2HEDp4b5QuIbdTowi3/9ms5W+nM58Xb IztrAX5wZX107yjG5UMIDhXgzA+KcF8BVClyxTgiYb5tCIZmVpQk0hUtpgv0oBe9zDY+ XyFxDHHc4DyWdh2590zi7ELmMSHXiFa9IyNWE5NPN+uJBa/xv2Cl5O8oS1SUFKVfdE8E f2oduNKauVX4kh9Iel0KbP8NhyPOZJlcxGuoCDPgwsu19K0dHZPrSQHeUywK37coRh3S iY+8UWnoGWN4VS1O6huqXV4jwxUHx7xXKiGc1lrd5tByjfGl3Dyl6lqp/9/SqLZvyftB 7iKA== Received: by 10.224.191.130 with SMTP id dm2mr13012741qab.98.1351329555168; Sat, 27 Oct 2012 02:19:15 -0700 (PDT) References: <25119728.35086.1351193115152.JavaMail.mobile-sync@iebqf5> From: Russell Jurney In-Reply-To: <25119728.35086.1351193115152.JavaMail.mobile-sync@iebqf5> Mime-Version: 1.0 (1.0) Date: Sat, 27 Oct 2012 02:19:14 -0700 Message-ID: <5836084739702032378@unknownmsgid> Subject: Re: reference architecture To: "user@hadoop.apache.org" Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org Russell Jurney http://datasyndrome.com On Oct 25, 2012, at 12:24 PM, "Daniel K=E4fer" = wrote: > Hello all, > > I'm looking for a reference architecture for hadoop. The only result I > found is Lambda architecture from Nathan Marz[0]. > > With architecture I mean answers to question like: > - How should I store the data? CSV, Thirft, ProtoBuf You should use Avro. > - How should I model the data? ER-Model, Starschema, something new? You should use document format. > - normalized or denormalized or both (master data normalized, then > transformation to denormalized, like ETL) Demoralized fully, into document format. > - How should i combine database and HDFS-Files? Don't. Put everything on HDFS. > > Are there any other documented architectures for hadoop? I really did make an example in my book. It is just one example, but you wanted answers to questions that always 'depend.' You can check it out in slides: http://www.slideshare.net/mobile/hortonworks/agile-analytics= -applications-on-hadoop > > Regards > Daniel K=E4fer > > > [0] http://www.manning.com/marz/ just a preprint yet, not completed >