hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eugeny N Dzhurinsky <b...@redwerk.com>
Subject custom implementation of InputFormat/RecordReader/InputSplit?
Date Mon, 19 Nov 2007 16:43:40 GMT
Hello, gentlemen!

I would like to implement a custom data provider which will create a records
to start map jobs with them. For example I would like to create a thread which
will extract some data from a storage (e.g. relational database) and start a
new job, which will take single record and start map/reduce processing. Each
of such record will produce a lot of results, which will be processed by
reduce task later.

The question is - how to implement such interfaces? As far as I learned, I
would need to implement interfaces InputSplit, RecordReader and and
InputFormat. However after looking at sources and javadocs I found all
operations seems to be file-based, and this file could be split between
several hosts, which isn't my case. I would deal with single stream I need to
parse and start a job.

Thank you in advance!

Eugene N Dzhurinsky

View raw message