Return-Path: X-Original-To: apmail-mahout-user-archive@www.apache.org Delivered-To: apmail-mahout-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 83F4C178F3 for ; Thu, 9 Oct 2014 18:41:51 +0000 (UTC) Received: (qmail 2309 invoked by uid 500); 9 Oct 2014 18:41:49 -0000 Delivered-To: apmail-mahout-user-archive@mahout.apache.org Received: (qmail 2243 invoked by uid 500); 9 Oct 2014 18:41:49 -0000 Mailing-List: contact user-help@mahout.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@mahout.apache.org Delivered-To: mailing list user@mahout.apache.org Received: (qmail 2231 invoked by uid 99); 9 Oct 2014 18:41:49 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 09 Oct 2014 18:41:49 +0000 X-ASF-Spam-Status: No, hits=3.4 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HK_RANDOM_ENVFROM,HK_RANDOM_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of teddyyyy123@gmail.com designates 209.85.217.172 as permitted sender) Received: from [209.85.217.172] (HELO mail-lb0-f172.google.com) (209.85.217.172) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 09 Oct 2014 18:41:22 +0000 Received: by mail-lb0-f172.google.com with SMTP id b6so1780795lbj.3 for ; Thu, 09 Oct 2014 11:41:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=yo/gfjVarLMUBrGgXd9QGE/1ukc5QClLa1VI3iRrZHo=; b=XRs2oVppSlDO7RlNTNF/fI+Q7OuXJM/ZxxL1o51XLOO7kTPpS+5dSiqFxKEZYqPN80 BNel5Du2EwNNTvdkcskuTBn1OffUHgbZ3qR0MBWQxG1tyOCJPCpziYXQZNizW7Xkp8fq tzRVjN8JecQVJrgpFx8OzCc6c4nkdTjGKcfeimEm5b2ZhqgjSNFyydo0cbe1sGXAlQ3r JbZUkvbO/LkNyyhFF1Ld6zvydneJ21QGZoZSxbiEcu8ZEg40vBr5UDsbJFqshrg0euRl 7xz11rWuXeilTnyODhYnx7j0oHwoyFTTE+w8OtDFfvxkAUpC1qBGPGVk7uj05NwvhMBZ wmcw== X-Received: by 10.112.149.105 with SMTP id tz9mr11709070lbb.5.1412880081304; Thu, 09 Oct 2014 11:41:21 -0700 (PDT) MIME-Version: 1.0 Received: by 10.152.6.130 with HTTP; Thu, 9 Oct 2014 11:40:48 -0700 (PDT) In-Reply-To: References: From: Yang Date: Thu, 9 Oct 2014 11:40:48 -0700 Message-ID: Subject: Re: SSVD Q-Job taking very long even after 100% ? To: user@mahout.apache.org Content-Type: multipart/alternative; boundary=047d7b3a87f25b4232050501c757 X-Virus-Checked: Checked by ClamAV on apache.org --047d7b3a87f25b4232050501c757 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable it's possible that they are compressing the output, I'm now rebuilding the code after commenting out the setOutputCompress(true) in the code also will run with compression param set to false but still it's quite surprising why compression should take so long (8--10minutes) On Thu, Oct 9, 2014 at 11:06 AM, Yang wrote: > my Q-Job MR job shows as 100% mapper complete (it's a map-only job) very > quickly, but the job itself does not finish, until about 10 minutes later= . > this is rather surprising. my input is a sparse vector of 37000 rows, and > the column count is 8000, with each row usually having < 10 elements set = to > non-zero. so the input size is fairly small. > > > I looked at the Q-job code, it seems rather normal, i.e. it's not doing > anything special after the map() function is completed. so I wonder why > it's lagging so long after 100% ? > > > here is the syslog from hadoop: > > > > 2014-10-09 10:37:40,504 INFO [main] org.apache.hadoop.io.compress.zlib.Zl= ibFactory: Successfully loaded & initialized native-zlib library > 2014-10-09 10:37:40,538 INFO [main] org.apache.hadoop.io.compress.CodecPo= ol: Got brand-new decompressor [.gz] > 2014-10-09 10:37:40,548 INFO [main] org.apache.hadoop.io.compress.CodecPo= ol: Got brand-new decompressor [.gz] > 2014-10-09 10:37:40,548 INFO [main] org.apache.hadoop.io.compress.CodecPo= ol: Got brand-new decompressor [.gz] > 2014-10-09 10:37:40,549 INFO [main] org.apache.hadoop.io.compress.CodecPo= ol: Got brand-new decompressor [.gz] > 2014-10-09 10:39:39,143 WARN [communication thread] org.apache.hadoop.yar= n.util.ProcfsBasedProcessTree: Error reading the stream java.io.IOException= : No such process > 2014-10-09 10:40:09,117 INFO [main] org.apache.hadoop.io.compress.CodecPo= ol: Got brand-new compressor [.deflate] > 2014-10-09 10:46:23,991 INFO [main] org.apache.hadoop.io.compress.CodecPo= ol: Got brand-new decompressor [.deflate] > 2014-10-09 10:46:23,992 INFO [main] org.apache.hadoop.io.compress.CodecPo= ol: Got brand-new decompressor [.deflate] > 2014-10-09 10:46:23,992 INFO [main] org.apache.hadoop.io.compress.CodecPo= ol: Got brand-new decompressor [.deflate] > 2014-10-09 10:46:23,992 INFO [main] org.apache.hadoop.io.compress.CodecPo= ol: Got brand-new decompressor [.deflate] > 2014-10-09 10:46:31,219 INFO [LeaseRenewer:yyang15@apollo-phx-nn.vip.ebay= .com:8020] org.apache.hadoop.ipc.Client: Retrying connect to server: apollo= -phx-nn.vip.ebay.com/10.115.201.75:8020. Already tried 0 time(s); maxRetrie= s=3D45 > 2014-10-09 10:47:45,241 INFO [main] org.apache.hadoop.io.compress.CodecPo= ol: Got brand-new compressor [.deflate] > 2014-10-09 10:47:46,571 INFO [main] org.apache.hadoop.mapred.Task: Task:a= ttempt_1412781120464_7857_m_000000_0 is done. And is in the process of comm= itting > 2014-10-09 10:47:46,739 INFO [main] org.apache.hadoop.mapred.Task: Task a= ttempt_1412781120464_7857_m_000000_0 is allowed to commit now > 2014-10-09 10:47:47,389 INFO [main] org.apache.hadoop.mapreduce.lib.outpu= t.FileOutputCommitter: Saved output of task 'attempt_1412781120464_7857_m_0= 00000_0' to hdfs://apollo-phx-nn.vip.ebay.com:8020/user/yyang15/CIReco/shoe= s/ssvd/tmp/ssvd/Q-job/_temporary/1/task_1412781120464_7857_m_000000 > 2014-10-09 10:47:47,574 INFO [main] org.apache.hadoop.mapred.Task: Task 'attempt= _1412781120464_7857_m_000000_0' done. > 2014-10-09 10:47:47,575 INFO [main] org.apache.hadoop.metrics2.impl.Metri= csSystemImpl: Stopping MapTask metrics system... > 2014-10-09 10:47:47,576 INFO [ganglia] org.apache.hadoop.metrics2.impl.Me= tricsSinkAdapter: ganglia thread interrupted. > 2014-10-09 10:47:47,576 INFO [main] org.apache.hadoop.metrics2.impl.Metri= csSystemImpl: MapTask metrics system stopped. > 2014-10-09 10:47:47,576 INFO [main] org.apache.hadoop.metrics2.impl.Metri= csSystemImpl: MapTask metrics system shutdown complete. > > --047d7b3a87f25b4232050501c757--