hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Deepak Halale <deepak.hal...@gmail.com>
Subject Hadoop Map/Reduce and Hive clarification
Date Sat, 12 Sep 2009 23:27:59 GMT
I am new to Hadoop , need some clarifications
a) how to automate executing Map/Reduce jobs and also automating loading
data in Hive, do I need to create  a cron job or is there a better way.

b) I have 2 tables as the source for M/R jobs
1) Order Master and Order detail
OrderMaster has order header columns
(OrderId,CustId,PaymentMethod,DeliveryMethod etc)
OrderDetail has orders' item level information (viz.
OrderId,ItemId,Quantity,SalesPrice,CostPrice,DeliveryAddress, Delivery
The relation between Master and Detail is 1 to many and OrderId is the key.

If I generate a tab delimited file from each table, how does Reduce  is
going to aggregate the data from OrderDetail example
If I have to sum the OrderRevenue by Order.



  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message