hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gang Luo <lgpub...@yahoo.com.cn>
Subject centralized record reader in new API
Date Thu, 05 Aug 2010 16:36:36 GMT
Hi all,
to create a RecordReader in new API, we needs a TaskAttemptContext object, which 
seems to me the RecordReader should only be created on each split that has been 
assigned a task ID. However, I want to do a centralized sampling and create 
record reader on some splits before the job is submitted. What I am doing is 
create a dummy TaskAttemptContext and use it to create record reader, but not 
sure whether there is some side-effects. Is there any better way to do this? Why 
we are not supposed to create record reader centrally as indicated by the new 



View raw message