hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amandeep Khurana <ama...@gmail.com>
Subject Re: Reading a subset of records from hdfs
Date Thu, 10 Sep 2009 07:49:13 GMT
Why not just have a higher number of mappers? Why split into multiple
jobs? Any particular case that you think this will be useful in?

On 9/9/09, Rakhi Khatwani <rkhatwani@gmail.com> wrote:
> Hi,
>        Suppose i have a hdfs file with 10,000 entries. and i want my job to
> process 100 records at one time (to minimize loss of data during job
> crashes/ network errors etc). so if a job can read a subset of records from
> a fine in HDFS, i can combine with chaining to achieve my objective.  for
> example i have job1 which reads 1-100 lines of input from hdfs, and job 2
> which reads from 101-200 lines of input...etc.
>  is there a way in which you can configure a job 2 read only a subset of
> records from a file in HDFS.
> Regards,
> Raakhi


Amandeep Khurana
Computer Science Graduate Student
University of California, Santa Cruz

View raw message