hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From edward choi <mp2...@gmail.com>
Subject Re: how to figure out the range of a split that failed?
Date Thu, 01 Jul 2010 05:15:04 GMT
Dear Sharad,

I have come across another problem. I hope you can help me with this too.
I am trying to use SkipBadRecords feature on Hadoop Streaming.
The streaming method I use is: "hadoop jar
But your example uses Java application which I cannot use because I am
trying to use a C++ application connecting it with Hadoop Streaming.

So what I am doing is:
hadoop jar $HADOOP_INSTALL/contrib/streaming/hadoop-*-streaming.jar -D
mapred.skip.mode.enabled=true -D mapred.skip.attempts.to.start.skipping=2 -D
mapred.skip.map.max.skip.records=Long.MAX_VALUE -D mapred.reduce.tasks=0
-file "..." -mapper "..." -input "..." -output "..."

Then I noticed that you have to set
"mapred.skip.map.auto.incr.proc.count=false" and increment
COUNTER_MAP_PROCESSED_RECORDS in your own application. I guess that you can
do this in your example, but I don't know how to do it using my way of
Hadoop Streaming. Could you enlighten me please?

Sincerely, Ed

2010/6/30 Sharad Agarwal <sharadag@yahoo-inc.com>

> edward choi wrote:
>> Thanks for the quick response.
>> I know the SkipBadRecords feature but unfortunately I cannot use it since
>> I
>> am running my job on Hadoop Streaming.
>> I had asked if there were any way to use SkipBadRecords in Hadoop
>> Streaming
>> but never got an answer. I guess it is not possible at all.
>> Thanks for your concern.
> SkipBadRecords feature can be used for streaming as well. Perhaps the best
> example is the testcase
> ->
> http://svn.apache.org/viewvc/hadoop/mapreduce/trunk/src/contrib/streaming/src/test/org/apache/hadoop/streaming/TestStreamingBadRecords.java?view=markup
> Sharad

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message