hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "小强" <790772...@qq.com>
Subject Question about Skip Bad Records
Date Sat, 15 Jun 2013 06:39:15 GMT
Hi, I found the SkippingRecordReader is no longer supported in the new api and I am curious
about the reason, can anyone tell me.

Besides, when I look into the old api and try to figure out what skip mode was doing, I am
a little confused about the logic there.
In my comprehension, if java api is used we can always precisely locate which one is the bad
If streaming is used, as long as user can handle the counter correctly (I mean accumulate
the counter for each record in), we can also locate the exact bad record. (I wonder if I miss
something here)
But if user don't care about the counter it's always a disaster for the framework to locate
bad records (even using binary search)

To sum up:
Ques 1:  why skip mode is removed in the new api
Ques 2:  if user handle counter correctly in streaming, can we locate the exact bad record
Ques 3:  when in skip mode, why not locate more bad records by restart the user logic instead
of locate one bad record for each task attempt

Thank you!

Dasheng Jiang
View raw message