spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bharath Ravi Kumar <reachb...@gmail.com>
Subject Re: ALS failure with size > Integer.MAX_VALUE
Date Tue, 02 Dec 2014 02:40:51 GMT
Yes, the issue appears to be due to the 2GB block size limitation. I am
hence looking for (user, product) block sizing suggestions to work around
the block size limitation.

On Sun, Nov 30, 2014 at 3:01 PM, Sean Owen <sowen@cloudera.com> wrote:

> (It won't be that, since you see that the error occur when reading a
> block from disk. I think this is an instance of the 2GB block size
> limitation.)
>
> On Sun, Nov 30, 2014 at 4:36 AM, Ganelin, Ilya
> <Ilya.Ganelin@capitalone.com> wrote:
> > Hi Bharath – I’m unsure if this is your problem but the
> > MatrixFactorizationModel in MLLIB which is the underlying component for
> ALS
> > expects your User/Product fields to be integers. Specifically, the input
> to
> > ALS is an RDD[Rating] and Rating is an (Int, Int, Double). I am
> wondering if
> > perhaps one of your identifiers exceeds MAX_INT, could you write a quick
> > check for that?
> >
> > I have been running a very similar use case to yours (with more
> constrained
> > hardware resources) and I haven’t seen this exact problem but I’m sure
> we’ve
> > seen similar issues. Please let me know if you have other questions.
> >
> > From: Bharath Ravi Kumar <reachbach@gmail.com>
> > Date: Thursday, November 27, 2014 at 1:30 PM
> > To: "user@spark.apache.org" <user@spark.apache.org>
> > Subject: ALS failure with size > Integer.MAX_VALUE
> >
> > We're training a recommender with ALS in mllib 1.1 against a dataset of
> 150M
> > users and 4.5K items, with the total number of training records being 1.2
> > Billion (~30GB data). The input data is spread across 1200 partitions on
> > HDFS. For the training, rank=10, and we've configured {number of user
> data
> > blocks = number of item data blocks}. The number of user/item blocks was
> > varied  between 50 to 1200. Irrespective of the block size (e.g. at 1200
> > blocks each), there are atleast a couple of tasks that end up shuffle
> > reading > 9.7G each in the aggregate stage (ALS.scala:337) and failing
> with
> > the following exception:
> >
> > java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE
> >         at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:745)
> >         at
> org.apache.spark.storage.DiskStore.getBytes(DiskStore.scala:108)
> >         at
> org.apache.spark.storage.DiskStore.getValues(DiskStore.scala:124)
> >         at
> >
> org.apache.spark.storage.BlockManager.getLocalFromDisk(BlockManager.scala:332)
> >         at
> >
> org.apache.spark.storage.BlockFetcherIterator$BasicBlockFetcherIterator$$anonfun$getLocalBlocks$1.apply(BlockFetcherIterator.scala:204)
> >
>

Mime
View raw message