crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Danny Morgan <>
Subject MongoDB Input Format
Date Thu, 25 Dec 2014 04:12:34 GMT
Hi Everyone,
I'm working on getting a MongoDB Source working for crunch as a holiday project. Luckily there
is already a MongoInputFormat provided by the mongo-hadoop project. I tried to follow the
example of the JDBC input in crunch-contrib, but I couldn't quite get things working. I'm
inheriting from FileTableSourceImpl and creating my own FormatBundle since MongoInputFormat
extends InputFormat as opposed to FileInputFormat which is what most of the crunch source
classes expect. Anyway I can't seem to get the darn thing to work. I have feeling that CrunchInputs
and friends expect the Source to be backed my some kind of file in HDFS and in the case of
the MongoInputFormat it's a connection over the network to a mongo server so something isn't
quite working. Any pointer's to which interfaces and classes I should base my implementation
off of or which methods I should override would be much appreciated.
View raw message