avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Deepak Nettem <deepaknet...@gmail.com>
Subject Mapper Only Avro, Read from Local File System
Date Thu, 15 Mar 2012 16:47:00 GMT
Hi,

I have a use case, wherein I need to write a Mapper Only job reads the file
from disk, and writes to HDFS in Avro serialized format. (I want to do this
because  I want the Mapper instances to actually download data from
somewhere onto local FS, and load that data in HDFS).

Issue:
1. The job won't have any HDFS Inputpath or OutputPath.
2. I want to be able to set the number of Mappers depending on my internet
bandwidth. So the number of mappers shouldn't be calculated based on
inputsplits..

Any suggestions on how to do this? I would really appreciate any example
code.

Deepak

Mime
View raw message