hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bejoy Ks <bejoy.had...@gmail.com>
Subject Re: Advice on Migrating to hadoop + hive
Date Thu, 27 Sep 2012 07:19:52 GMT
Hi Matthieu

Adding on to Michael's comments.

Hive is good for batch processing and generating reports over larger dats
sets. It is not meant for point to point queries, if you have much of those
then hive is not the choice.

You can get your daily data processed in hadoop and load them on to hive
tables. Hive has a new feature 'INSERT INTO' for adding data
into exiting tables/partitions. For your case you can create a partitioned
table based on date and load each day's processed data into corresponding
date partitions. With partitions you will have an advantage - if you issue
a query on some date/dates only those partitions will be scanned rather
than the whole table.

Tableau, MicroStragegy, Pentaho etc supports reporting on top of hive
tables.
If you are looking at some static predefined reports, you can do the
aggregation in hive, take the final aggregated results to any rdbms using
Sqoop and connect any reporting tool of your choice to that.

Some urls for reference
https://cwiki.apache.org/Hive/languagemanual-ddl.html#LanguageManualDDL-Partitionedtables
https://cwiki.apache.org/Hive/languagemanual-dml.html#LanguageManualDML-Loadingfilesintotables
https://cwiki.apache.org/Hive/tutorial.html#Tutorial-PartitionBasedQuery


Regards
Bejoy KS

Mime
View raw message