Return-Path: X-Original-To: apmail-mahout-dev-archive@www.apache.org Delivered-To: apmail-mahout-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 133C010C00 for ; Tue, 18 Nov 2014 01:26:26 +0000 (UTC) Received: (qmail 35091 invoked by uid 500); 18 Nov 2014 01:26:25 -0000 Delivered-To: apmail-mahout-dev-archive@mahout.apache.org Received: (qmail 35006 invoked by uid 500); 18 Nov 2014 01:26:25 -0000 Mailing-List: contact dev-help@mahout.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@mahout.apache.org Delivered-To: mailing list dev@mahout.apache.org Received: (qmail 34995 invoked by uid 99); 18 Nov 2014 01:26:25 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 18 Nov 2014 01:26:25 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [209.85.192.175] (HELO mail-pd0-f175.google.com) (209.85.192.175) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 18 Nov 2014 01:25:58 +0000 Received: by mail-pd0-f175.google.com with SMTP id y10so3507058pdj.6 for ; Mon, 17 Nov 2014 17:24:27 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:content-type:mime-version:subject:from :in-reply-to:date:content-transfer-encoding:message-id:references:to; bh=xlwUZ7rzHVgzyoA8LYt+vP7iOs8l7YxoXm5fFqmVF3s=; b=dJOQKRAZzv69sIqHKdWnVmn3S4XPwLtO+ZYquq8fb900u0mT9uCV6LHihVXgZlW+bm 1Q/VhrhcYQ0sNk4AUktRW9q7kQ2X6Q63DZRolm0662HjWhF2IiIhumK9NVQf+3/LeAvL GcPbeVUFq5mGGDiGfNMqyZOjDz0KykMTwY2o94Z9lq29qC9jeiF5PB+ltaqlX7vHLYMF wr77M2SmFI4kTIwJ/w/T8SB0+waTKKmwXB2h/NGmlz3ohqNZnKKvasH9OqQoolXO6/sr OTJZaiW9a1QtQvjI1+D7bHC0F+phlrBUktuoqXPNdPqofufcasUhkaplWgyCugAOQUvN TEWg== X-Gm-Message-State: ALoCoQk06iDb6QHMjkzdXiiLVf/hp/aWPEumqnf1SDgQJoww44U3A+T227/RyWEj/AA1LIr8gYjf X-Received: by 10.68.131.227 with SMTP id op3mr107565pbb.164.1416273866914; Mon, 17 Nov 2014 17:24:26 -0800 (PST) Received: from [192.168.0.2] (c-24-22-234-117.hsd1.wa.comcast.net. [24.22.234.117]) by mx.google.com with ESMTPSA id uj7sm36471272pac.4.2014.11.17.17.24.25 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Mon, 17 Nov 2014 17:24:25 -0800 (PST) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 8.0 \(1990.1\)) Subject: Re: AtA error From: Pat Ferrel In-Reply-To: Date: Mon, 17 Nov 2014 17:24:24 -0800 Content-Transfer-Encoding: quoted-printable Message-Id: <51DD88CE-9FFD-4AC9-B285-1DB887C7A8EE@occamsmachete.com> References: <9FECAC7D-E1DC-4E5D-97C4-3A72B94F4813@occamsmachete.com> To: dev@mahout.apache.org X-Mailer: Apple Mail (2.1990.1) X-Virus-Checked: Checked by ClamAV on apache.org I do use drmWrap so I=E2=80=99ll check there, thanks On Nov 17, 2014, at 5:22 PM, Dmitriy Lyubimov wrote: On Mon, Nov 17, 2014 at 5:16 PM, Pat Ferrel = wrote: > It=E2=80=99s in spark-itemsimilarity. This job reads elements and = assigns them to > one of two RDD backed drms. >=20 > I assumed it was a badly formed drm but it=E2=80=99s a 140MB dataset = and a bit > hard to nail down=E2=80=94just looking for a clue. I read this to say = that an ID > for an element in a row vector was larger than drm.ncol, correct? >=20 yes. and then it again comes back to the question how the matrix was constructed. General construction of dimensions (ncol, nrow) is automatic-lazy, meaning if you have not specified dimensions anywhere explicitly, it will lazily compute it for you. But if you did volunteer them anywhere (such as to drmWrap() call) they got to be good. Or you = see things like this. >=20 >=20 > On Nov 17, 2014, at 4:58 PM, Dmitriy Lyubimov = wrote: >=20 > So this is not a problem of A'A computation -- the input is obviously > invalid. >=20 > Question is what you did before you got a A handle -- read it from = file? > parallelized it from in-core matrix (drmParallelize)? as a result of = other > computation (if yes than what)? wrapped around manually crafted RDD > (drmWrap)? >=20 > I don't understand the question about non-continuous ids. You are = referring > to some context of your computation assuming I am in context (but i am > unfortunately not) >=20 > On Mon, Nov 17, 2014 at 4:55 PM, Dmitriy Lyubimov > wrote: >=20 >>=20 >>=20 >> On Mon, Nov 17, 2014 at 3:46 PM, Pat Ferrel > wrote: >>=20 >>> A matrix with about 4600 rows and somewhere around 27790 columns = when >>> executing the following line from AtA (not sure of the exact = dimensions) >>>=20 >>> /** The version of A'A that does not use GraphX */ >>> def at_a_nongraph(op: OpAtA[_], srcRdd: DrmRdd[_]): DrmRdd[Int] =3D= { >>>=20 >>> a vector is created whose size is causes the error. How could I have >>> constructed a drm that would cause this error? If the column IDs = were >>> non-contiguous would that yield this error? >>>=20 >>=20 >> what did you do specifically to build matrix A? >>=20 >>=20 >>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >>>=20 >>> 14/11/12 17:56:03 ERROR executor.Executor: Exception in task 5.0 in > stage >>> 18.0 (TID 66169) >>> org.apache.mahout.math.IndexException: Index 27792 is outside = allowable >>> range of [0,27789) >>> at >>> = org.apache.mahout.math.AbstractVector.viewPart(AbstractVector.java:147) >>> at >>> = org.apache.mahout.math.scalabindings.VectorOps.apply(VectorOps.scala:37) >>> at >>>=20 > = org.apache.mahout.sparkbindings.blas.AtA$$anonfun$5$$anonfun$apply$6.apply= (AtA.scala:152) >>> at >>>=20 > = org.apache.mahout.sparkbindings.blas.AtA$$anonfun$5$$anonfun$apply$6.apply= (AtA.scala:149) >>> at >>> = scala.collection.immutable.Stream$$anonfun$map$1.apply(Stream.scala:376) >>> at >>> = scala.collection.immutable.Stream$$anonfun$map$1.apply(Stream.scala:376) >>> at = scala.collection.immutable.Stream$Cons.tail(Stream.scala:1085) >>> at = scala.collection.immutable.Stream$Cons.tail(Stream.scala:1077) >>> at >>>=20 > = scala.collection.immutable.StreamIterator$$anonfun$next$1.apply(Stream.sca= la:980) >>> at >>>=20 > = scala.collection.immutable.StreamIterator$$anonfun$next$1.apply(Stream.sca= la:980) >>> at >>>=20 > = scala.collection.immutable.StreamIterator$LazyCell.v$lzycompute(Stream.sca= la:969) >>> at >>> = scala.collection.immutable.StreamIterator$LazyCell.v(Stream.scala:969) >>> at >>> scala.collection.immutable.StreamIterator.hasNext(Stream.scala:974) >>> at = scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) >>> at >>>=20 > = org.apache.spark.util.collection.ExternalAppendOnlyMap.insertAll(ExternalA= ppendOnlyMap.scala:137) >>> at >>> org.apache.spark.Aggregator.combineValuesByKey(Aggregator.scala:58) >>> at >>>=20 > = org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.sc= ala:55) >>> at >>>=20 > = org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)= >>> at >>>=20 > = org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)= >>> at org.apache.spark.scheduler.Task.run(Task.scala:54) >>> at >>> = org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) >>> at >>>=20 > = java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.= java:895) >>> at >>>=20 > = java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java= :918) >>> at java.lang.Thread.run(Thread.java:695) >>>=20 >>>=20 >>=20 >=20 >=20