Return-Path: X-Original-To: apmail-hive-user-archive@www.apache.org Delivered-To: apmail-hive-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7B235C160 for ; Tue, 15 May 2012 16:41:55 +0000 (UTC) Received: (qmail 62128 invoked by uid 500); 15 May 2012 16:41:54 -0000 Delivered-To: apmail-hive-user-archive@hive.apache.org Received: (qmail 62090 invoked by uid 500); 15 May 2012 16:41:54 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 62082 invoked by uid 99); 15 May 2012 16:41:54 -0000 Received: from minotaur.apache.org (HELO minotaur.apache.org) (140.211.11.9) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 15 May 2012 16:41:54 +0000 Received: from localhost (HELO mail-lpp01m010-f48.google.com) (127.0.0.1) (smtp-auth username omalley, mechanism plain) by minotaur.apache.org (qpsmtpd/0.29) with ESMTP; Tue, 15 May 2012 16:41:53 +0000 Received: by lagz14 with SMTP id z14so5961873lag.35 for ; Tue, 15 May 2012 09:41:51 -0700 (PDT) MIME-Version: 1.0 Received: by 10.152.111.200 with SMTP id ik8mr13091163lab.15.1337100111674; Tue, 15 May 2012 09:41:51 -0700 (PDT) Received: by 10.112.133.162 with HTTP; Tue, 15 May 2012 09:41:51 -0700 (PDT) In-Reply-To: References: Date: Tue, 15 May 2012 09:41:51 -0700 Message-ID: Subject: Re: What's the right data storage/representation? From: "Owen O'Malley" To: user@hive.apache.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On Tue, May 15, 2012 at 5:11 AM, Jon Palmer wrote: > I can see a few potential solutions: > > 1.=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 Don=E2=80=99t solve it. Accept tha= t you have some artifacts in your > reporting data that cannot be recovered from the source data. > > 2.=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 Create status and location history= tables in the application db and > use that during the analytics process. > > 3.=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 Log the status and location change= =E2=80=98events=E2=80=99 to some other log file > and use those logs in the Hive analysis. I would probably create a Hive table that includes the status and location updates. One of the advantages of Hive & Hadoop is that it is easy to store the raw information in bulk and continue to process it. Once you have the information, you will likely find new uses for it. -- Owen