mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yang <teddyyyy...@gmail.com>
Subject Re: SSVD Q-Job taking very long even after 100% ?
Date Thu, 09 Oct 2014 18:40:48 GMT
it's possible that  they are compressing the output, I'm now rebuilding the
code after commenting out the setOutputCompress(true) in the code

also will run with compression param set to false


but still it's quite surprising why compression should take so long
(8--10minutes)

On Thu, Oct 9, 2014 at 11:06 AM, Yang <teddyyyy123@gmail.com> wrote:

> my Q-Job MR job shows as 100% mapper complete (it's a map-only job) very
> quickly, but the job itself does not finish, until about 10 minutes later.
> this is rather surprising. my input is a sparse vector of 37000 rows, and
> the column count is 8000, with each row usually having < 10 elements set to
> non-zero. so the input size is fairly small.
>
>
> I looked at the Q-job code, it seems rather normal, i.e. it's not doing
> anything special after the map() function is completed. so I wonder why
> it's lagging so long after 100% ?
>
>
> here is the syslog from hadoop:
>
>
>
> 2014-10-09 10:37:40,504 INFO [main] org.apache.hadoop.io.compress.zlib.ZlibFactory: Successfully
loaded & initialized native-zlib library
> 2014-10-09 10:37:40,538 INFO [main] org.apache.hadoop.io.compress.CodecPool: Got brand-new
decompressor [.gz]
> 2014-10-09 10:37:40,548 INFO [main] org.apache.hadoop.io.compress.CodecPool: Got brand-new
decompressor [.gz]
> 2014-10-09 10:37:40,548 INFO [main] org.apache.hadoop.io.compress.CodecPool: Got brand-new
decompressor [.gz]
> 2014-10-09 10:37:40,549 INFO [main] org.apache.hadoop.io.compress.CodecPool: Got brand-new
decompressor [.gz]
> 2014-10-09 10:39:39,143 WARN [communication thread] org.apache.hadoop.yarn.util.ProcfsBasedProcessTree:
Error reading the stream java.io.IOException: No such process
> 2014-10-09 10:40:09,117 INFO [main] org.apache.hadoop.io.compress.CodecPool: Got brand-new
compressor [.deflate]
> 2014-10-09 10:46:23,991 INFO [main] org.apache.hadoop.io.compress.CodecPool: Got brand-new
decompressor [.deflate]
> 2014-10-09 10:46:23,992 INFO [main] org.apache.hadoop.io.compress.CodecPool: Got brand-new
decompressor [.deflate]
> 2014-10-09 10:46:23,992 INFO [main] org.apache.hadoop.io.compress.CodecPool: Got brand-new
decompressor [.deflate]
> 2014-10-09 10:46:23,992 INFO [main] org.apache.hadoop.io.compress.CodecPool: Got brand-new
decompressor [.deflate]
> 2014-10-09 10:46:31,219 INFO [LeaseRenewer:yyang15@apollo-phx-nn.vip.ebay.com:8020] org.apache.hadoop.ipc.Client:
Retrying connect to server: apollo-phx-nn.vip.ebay.com/10.115.201.75:8020. Already tried 0
time(s); maxRetries=45
> 2014-10-09 10:47:45,241 INFO [main] org.apache.hadoop.io.compress.CodecPool: Got brand-new
compressor [.deflate]
> 2014-10-09 10:47:46,571 INFO [main] org.apache.hadoop.mapred.Task: Task:attempt_1412781120464_7857_m_000000_0
is done. And is in the process of committing
> 2014-10-09 10:47:46,739 INFO [main] org.apache.hadoop.mapred.Task: Task attempt_1412781120464_7857_m_000000_0
is allowed to commit now
> 2014-10-09 10:47:47,389 INFO [main] org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter:
Saved output of task 'attempt_1412781120464_7857_m_000000_0' to hdfs://apollo-phx-nn.vip.ebay.com:8020/user/yyang15/CIReco/shoes/ssvd/tmp/ssvd/Q-job/_temporary/1/task_1412781120464_7857_m_000000
> 2014-10-09 <http://apollo-phx-nn.vip.ebay.com:8020/user/yyang15/CIReco/shoes/ssvd/tmp/ssvd/Q-job/_temporary/1/task_1412781120464_7857_m_0000002014-10-09>
10:47:47,574 INFO [main] org.apache.hadoop.mapred.Task: Task 'attempt_1412781120464_7857_m_000000_0'
done.
> 2014-10-09 10:47:47,575 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl:
Stopping MapTask metrics system...
> 2014-10-09 10:47:47,576 INFO [ganglia] org.apache.hadoop.metrics2.impl.MetricsSinkAdapter:
ganglia thread interrupted.
> 2014-10-09 10:47:47,576 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl:
MapTask metrics system stopped.
> 2014-10-09 10:47:47,576 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl:
MapTask metrics system shutdown complete.
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message