mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Suneel Marthi <>
Subject Re: wikipedia bayes quickstart example on EC2 (cloudera)
Date Sat, 01 Mar 2014 17:15:22 GMT
Please work off of the latest Mahout 0.9, most of these issues from Mahout 0.7 have been addressed
in later releases.

On Saturday, March 1, 2014 12:14 PM, Jessie Wright <> wrote:

I'm a noob and trying to run the wikipedia bayes example on EC2 (using a
cdh4.5 setup).  I've searched the archives and haven't been able to find
info on this.  I apologize if this is a duplicate question.

The cloudera install comes with Mahout 0.7.

I've run into a few snags on the first step (chunking the data into
pieces).  The first was that it couldn't find  wikipediaXMLSplitter but I
found that substituting org.apache.mahout.text.wikipedia.WikipediaXmlSplitter
in the command it got past that error. (just changing
the capitalization wasn't enough)

However I am now stuck.  I'm getting a java.lang.OutOfMemoryError: Java
heap space error.
I upped MAHOUT_HEAPSIZE to 5000 and am still getting the same error.
See the full error here:  (I added a print
statement to the mahout/bin just to confirm that my export of
MAHOUT_HEAPSIZE was being successfully detected)

I'm wondering whether some other  setting is overriding the
MAHOUT_HEAPSIZE?  One of the hadoop or cloudera specific ones?

Does anyone have any experience with this or suggestions?

Thank you,

Jessie Wright
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message