Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8F432D6F3 for ; Thu, 27 Sep 2012 01:04:40 +0000 (UTC) Received: (qmail 6101 invoked by uid 500); 27 Sep 2012 01:04:35 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 5968 invoked by uid 500); 27 Sep 2012 01:04:35 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 5958 invoked by uid 99); 27 Sep 2012 01:04:35 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 27 Sep 2012 01:04:35 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FSL_RCVD_USER,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of matthieu@actionx.com designates 209.85.212.182 as permitted sender) Received: from [209.85.212.182] (HELO mail-wi0-f182.google.com) (209.85.212.182) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 27 Sep 2012 01:04:29 +0000 Received: by wibhm2 with SMTP id hm2so1096896wib.11 for ; Wed, 26 Sep 2012 18:04:07 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type :x-gm-message-state; bh=3UQahRw8bzp/3zDPHpJJpVcOFmhudXzk5PfNkAso29E=; b=RjxSqDFS6MTBxlvlQ7jNhtAqxTDCgDx0tFwbcM4FRXxbk11cg4OdOCTUrPH7u6tl3l lIyNhB1KYvSkDhPsrjKePjZ5N4PBPTTyqFxvs2I0uvfHNspt6S5iH3u5k5i+AugQHfeF PcwjSk9D7T28AR8xY5rCH3UsHY5frsh/LWR4ImJ2J60tpyFqUQDHW3msvKKSL3ffU6lc CAmgXQwz3bpI1FloChFQWBvj0cQGFxLRWmyJ7Ga3UUSxRAWUrxd+P7Nzs0YZrTaQ8AtM fmcXfd+W5IYPrftzQWWGYjuO1hluESMxPZAmnbwefJdaz3VaL5dPNl1lHK1lowM+K3U/ BSKA== MIME-Version: 1.0 Received: by 10.216.194.148 with SMTP id m20mr551677wen.51.1348707847902; Wed, 26 Sep 2012 18:04:07 -0700 (PDT) Received: by 10.223.203.73 with HTTP; Wed, 26 Sep 2012 18:04:07 -0700 (PDT) Date: Wed, 26 Sep 2012 21:04:07 -0400 Message-ID: Subject: Advice on Migrating to hadoop + hive From: Matthieu Labour To: user Content-Type: multipart/alternative; boundary=0016e6da98c42e143c04caa48400 X-Gm-Message-State: ALoCoQkQNCulLNbOdmvJeiU+AOexKbo2WfdIo3kLnZyUzklpvp3IVrcxui+oGjfVZFhMVT70cpzV X-Virus-Checked: Checked by ClamAV on apache.org --0016e6da98c42e143c04caa48400 Content-Type: text/plain; charset=ISO-8859-1 Hi I have posted in this user group before and received great help. Thank you! I am hoping to also get some advice for the following hive/hadoop question: The way we currently process our log files is the following: we collect log files. We run a program via cron job that processes/consolidates them and inserts rows in Postgresql database. Analysts connect to the database, performs sql queries, generate excel reports. Our logs are growing. The process of getting the data into the database is getting too slow. We are thinking leveraging hadoop and my questions are the following. Should we use hadoop to insert to Postgresql or can we get rid of Postgresql and rely on Hive only ? If we use Hive, can we persist the Hive table so we only load the data (run the hadoop job) one time ? Can we insert into existing Hive table and add a day of data without the need to reprocess all previous days files ? Are there Hive visual tools (Similar to Postgres Maestro) that would make it easier for the analyst to build/run queries? (Ideally they would need to work with Amazon EWS) Thank you for your help Cheers Matthieu --0016e6da98c42e143c04caa48400 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Hi
I have posted in this user group before and received great help. Tha= nk you! I am hoping to also get some advice for the following hive/hadoop q= uestion:
The way we currently process our log files is the follow= ing: we collect log files. We run a program via cron job that processes/con= solidates them and inserts rows in Postgresql database. Analysts connect to= the database, performs sql queries, generate excel reports. Our logs are g= rowing. The process of getting the data into the database is getting too sl= ow.
We are thinking leveraging hadoop and my questions are the following.= =A0
Should we use hadoop to insert to Postgresql or can we get ri= d of Postgresql and rely on Hive only ?
If we use Hive, can we pe= rsist the Hive table so we only load the data (run the hadoop job) one time= ?
Can we insert into existing Hive table and add a day of data without t= he need to reprocess all previous days files ?
Are there Hive vis= ual tools (Similar to Postgres Maestro) that would make it easier for the a= nalyst to build/run queries? (Ideally they would need to work with Amazon E= WS)
Thank you for your help
Cheers
Matthieu
=

--0016e6da98c42e143c04caa48400--