hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ayon Sinha <ayonsi...@yahoo.com>
Subject Re: hadoop cluster for querying data on mongodb
Date Wed, 21 Dec 2011 05:12:47 GMT
Couple of things:
1. Hadoop's strength is in data locality. So having most of your Hadoop heavy lifting on local
filesystem (HDFS where hadoop computation is shipped to the nodes with the data).
2. Assuming you are pulling data into Hadoop from Mongo to crunch and put the resulting data
back into Mongo as only the 1st and the last step in your entire workflow, you are basically
looking for a MongoInputFormat and MongoOutputFormat (I made up the class names). you are
probably looking for https://jira.mongodb.org/browse/HADOOP/component/10736

Your other options if using Pig or Hive is to write Loader UDF's, similar to PigStorage, HBaseStorage,
See My Photos on Flickr
Also check out my Blog for answers to commonly asked questions.

 From: Martinus Martinus <martinus787@gmail.com>
To: hdfs-user@hadoop.apache.org 
Sent: Tuesday, December 20, 2011 7:31 PM
Subject: hadoop cluster for querying data on mongodb


I have hadoop cluster running and have my data inside mongodb
 database. I already write a java code to query data on mongodb using 
mongodb-java driver. And right now, I want to use hadoop cluster to run 
my java code to get and put the data from and to mongo database. Did 
anyone has done this before? Can you explain to me how to do that?

View raw message