hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: Reading part of file using Map Reduce
Date Thu, 01 Nov 2012 03:12:53 GMT
IIRC you can do this, but MR had some issues if you passed it a
non-closed (but sync'd upon) file for splitting.

However, if you run into similar issues, try generating your own
splits over the big file via FileInputFormat#getSplits(…), which will
then work.

On Thu, Nov 1, 2012 at 4:50 AM, Pankaj Gupta <pankaj@brightroll.com> wrote:
> Hi,
> Is it possible to run a MapReduce job on a part of file on HDFS? The use case is using
a single file on HDFS as a stream to store all log events of a particular kind. New data can
grow on top while Map Reduce can process old data. Of course one option would be to copy part
of data into a separate file and give that to MapReduce but I was wondering if that extra
copy can be avoided.
> Thanks,
> Pankaj

Harsh J

View raw message