hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rakhi Khatwani <rkhatw...@gmail.com>
Subject Reading a subset of records from hdfs
Date Thu, 10 Sep 2009 06:16:12 GMT
       Suppose i have a hdfs file with 10,000 entries. and i want my job to
process 100 records at one time (to minimize loss of data during job
crashes/ network errors etc). so if a job can read a subset of records from
a fine in HDFS, i can combine with chaining to achieve my objective.  for
example i have job1 which reads 1-100 lines of input from hdfs, and job 2
which reads from 101-200 lines of input...etc.
 is there a way in which you can configure a job 2 read only a subset of
records from a file in HDFS.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message