mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Suneel Marthi <suneel_mar...@yahoo.com>
Subject Re: wikipedia bayes quickstart example on EC2 (cloudera)
Date Sat, 01 Mar 2014 17:15:22 GMT
Please work off of the latest Mahout 0.9, most of these issues from Mahout 0.7 have been addressed
in later releases.





On Saturday, March 1, 2014 12:14 PM, Jessie Wright <jessie.wright@gmail.com> wrote:
 
Hi,

I'm a noob and trying to run the wikipedia bayes example on EC2 (using a
cdh4.5 setup).  I've searched the archives and haven't been able to find
info on this.  I apologize if this is a duplicate question.

The cloudera install comes with Mahout 0.7.

I've run into a few snags on the first step (chunking the data into
pieces).  The first was that it couldn't find  wikipediaXMLSplitter but I
found that substituting org.apache.mahout.text.wikipedia.WikipediaXmlSplitter
in the command it got past that error. (just changing
the capitalization wasn't enough)

However I am now stuck.  I'm getting a java.lang.OutOfMemoryError: Java
heap space error.
I upped MAHOUT_HEAPSIZE to 5000 and am still getting the same error.
See the full error here: http://pastebin.com/P5PYuR8U  (I added a print
statement to the mahout/bin just to confirm that my export of
MAHOUT_HEAPSIZE was being successfully detected)

I'm wondering whether some other  setting is overriding the
MAHOUT_HEAPSIZE?  One of the hadoop or cloudera specific ones?

Does anyone have any experience with this or suggestions?

Thank you,

Jessie Wright
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message