mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pat Ferrel <...@occamsmachete.com>
Subject Re: AtA error
Date Tue, 18 Nov 2014 01:24:24 GMT
I do use drmWrap so I’ll check there, thanks

On Nov 17, 2014, at 5:22 PM, Dmitriy Lyubimov <dlieu.7@gmail.com> wrote:

On Mon, Nov 17, 2014 at 5:16 PM, Pat Ferrel <pat@occamsmachete.com> wrote:

> It’s in spark-itemsimilarity. This job reads elements and assigns them to
> one of two RDD backed drms.
> 
> I assumed it was a badly formed drm but it’s a 140MB dataset and a bit
> hard to nail down—just looking for a clue. I read this to say that an ID
> for an element in a row vector was larger than drm.ncol, correct?
> 

yes.

and then it again comes back to the question how the matrix was
constructed. General construction of dimensions (ncol, nrow) is
automatic-lazy, meaning if you have not specified dimensions anywhere
explicitly, it will lazily compute it for you. But if you did volunteer
them anywhere (such as to drmWrap() call) they got to be good. Or you see
things like this.

> 
> 
> On Nov 17, 2014, at 4:58 PM, Dmitriy Lyubimov <dlieu.7@gmail.com> wrote:
> 
> So this is not a problem of A'A computation -- the input is obviously
> invalid.
> 
> Question is what you did before you got a A handle -- read it from file?
> parallelized it from in-core matrix (drmParallelize)? as a result of other
> computation (if yes than what)? wrapped around manually crafted RDD
> (drmWrap)?
> 
> I don't understand the question about non-continuous ids. You are referring
> to some context of your computation assuming I am in context (but i am
> unfortunately not)
> 
> On Mon, Nov 17, 2014 at 4:55 PM, Dmitriy Lyubimov <dlieu.7@gmail.com>
> wrote:
> 
>> 
>> 
>> On Mon, Nov 17, 2014 at 3:46 PM, Pat Ferrel <pat@occamsmachete.com>
> wrote:
>> 
>>> A matrix with about 4600 rows and somewhere around 27790 columns when
>>> executing the following line from AtA (not sure of the exact dimensions)
>>> 
>>>    /** The version of A'A that does not use GraphX */
>>>    def at_a_nongraph(op: OpAtA[_], srcRdd: DrmRdd[_]): DrmRdd[Int] = {
>>> 
>>> a vector is created whose size is causes the error. How could I have
>>> constructed a drm that would cause this error? If the column IDs were
>>> non-contiguous would that yield this error?
>>> 
>> 
>> what did you do specifically to build matrix A?
>> 
>> 
>>> ==================
>>> 
>>> 14/11/12 17:56:03 ERROR executor.Executor: Exception in task 5.0 in
> stage
>>> 18.0 (TID 66169)
>>> org.apache.mahout.math.IndexException: Index 27792 is outside allowable
>>> range of [0,27789)
>>>       at
>>> org.apache.mahout.math.AbstractVector.viewPart(AbstractVector.java:147)
>>>       at
>>> org.apache.mahout.math.scalabindings.VectorOps.apply(VectorOps.scala:37)
>>>       at
>>> 
> org.apache.mahout.sparkbindings.blas.AtA$$anonfun$5$$anonfun$apply$6.apply(AtA.scala:152)
>>>       at
>>> 
> org.apache.mahout.sparkbindings.blas.AtA$$anonfun$5$$anonfun$apply$6.apply(AtA.scala:149)
>>>       at
>>> scala.collection.immutable.Stream$$anonfun$map$1.apply(Stream.scala:376)
>>>       at
>>> scala.collection.immutable.Stream$$anonfun$map$1.apply(Stream.scala:376)
>>>       at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1085)
>>>       at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1077)
>>>       at
>>> 
> scala.collection.immutable.StreamIterator$$anonfun$next$1.apply(Stream.scala:980)
>>>       at
>>> 
> scala.collection.immutable.StreamIterator$$anonfun$next$1.apply(Stream.scala:980)
>>>       at
>>> 
> scala.collection.immutable.StreamIterator$LazyCell.v$lzycompute(Stream.scala:969)
>>>       at
>>> scala.collection.immutable.StreamIterator$LazyCell.v(Stream.scala:969)
>>>       at
>>> scala.collection.immutable.StreamIterator.hasNext(Stream.scala:974)
>>>       at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
>>>       at
>>> 
> org.apache.spark.util.collection.ExternalAppendOnlyMap.insertAll(ExternalAppendOnlyMap.scala:137)
>>>       at
>>> org.apache.spark.Aggregator.combineValuesByKey(Aggregator.scala:58)
>>>       at
>>> 
> org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:55)
>>>       at
>>> 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
>>>       at
>>> 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>>>       at org.apache.spark.scheduler.Task.run(Task.scala:54)
>>>       at
>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
>>>       at
>>> 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
>>>       at
>>> 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
>>>       at java.lang.Thread.run(Thread.java:695)
>>> 
>>> 
>> 
> 
> 


Mime
View raw message