hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <michael_se...@hotmail.com>
Subject Re: Advice on Migrating to hadoop + hive
Date Thu, 27 Sep 2012 01:27:10 GMT
You can get rid of Postgres and go with Hive. 
You may want to consider setting up an external table so you just drop your logs in to place.

(Define once in Hive's metadata store, and then just drop data within the space / partitions)

Karmasphere and others. 

Sorry for the terse post. This should point you in the right direction. Also check out the
new Hive Book which should hit the streets in the next couple of weeks. 

On Sep 26, 2012, at 8:04 PM, Matthieu Labour <matthieu@actionx.com> wrote:

> Hi
> I have posted in this user group before and received great help. Thank you! I am hoping
to also get some advice for the following hive/hadoop question:
> The way we currently process our log files is the following: we collect log files. We
run a program via cron job that processes/consolidates them and inserts rows in Postgresql
database. Analysts connect to the database, performs sql queries, generate excel reports.
Our logs are growing. The process of getting the data into the database is getting too slow.
> We are thinking leveraging hadoop and my questions are the following. 
> Should we use hadoop to insert to Postgresql or can we get rid of Postgresql and rely
on Hive only ?
> If we use Hive, can we persist the Hive table so we only load the data (run the hadoop
job) one time ?
> Can we insert into existing Hive table and add a day of data without the need to reprocess
all previous days files ?
> Are there Hive visual tools (Similar to Postgres Maestro) that would make it easier for
the analyst to build/run queries? (Ideally they would need to work with Amazon EWS)
> Thank you for your help
> Cheers
> Matthieu

View raw message