mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Suneel Marthi <>
Subject Re: Heap space
Date Sun, 09 Mar 2014 22:01:25 GMT

Firstly thanks for starting this email thread and for 
highlighting the issues with wikipedia example. Since you raised this issue, I updated the
new wikipedia examples page at
 and also responded to a similar question on StackOverFlow at

I am assuming that u r running this locally on ur machine and r just trying out the examples.
Try out Sebastian's suggestion or else try running the example on a much smaller dataset of
wikipedia articles.  

Lastly, w do realize that u have been struggling with this for about 3 days now.  Mahout
presently lacks an entry for 'wikipediaXmlSplitter' in driver.classes.default.props.  Not
sure at what point in time and which release that had happened.

Please file a Jira for this and submit a patch.

On Sunday, March 9, 2014 2:25 PM, Mahmood Naderan <> wrote:
Hi Suneel,
Do you have any idea? Searching the web shows many question regarding the heap size for wikipediaXMLSplitter.
I have increased the the memory size to 16GB and still get that error. I have to say that
using 'top' command, I see only 1GB of memory is in use. So I wonder why it report such an
Is this a problem with Java, Mahout, Hadoop, ..?


On Sunday, March 9, 2014 4:00 PM, Mahmood Naderan <> wrote:
Excuse me, I added the -Xmx option and restarted the hadoop services using
sbin/ && sbin/

however still I get heap size error. How can I find the correct and needed heap size?


On Sunday, March 9, 2014 1:37 PM, Mahmood Naderan <> wrote:
OK  I found that I have to add this property to mapred-site.xml



On Sunday, March 9, 2014 11:39 AM, Mahmood Naderan <> wrote:

I ran this command

    ./bin/mahout wikipediaXMLSplitter -d examples/temp/enwiki-latest-pages-articles.xml
-o wikipedia/chunks -c 64

but got this error
     Exception in thread "main" java.lang.OutOfMemoryError: Java heap space

There are many web pages regarding this and the solution is to add "-Xmx 2048M" for example.
My question is, that option should be passed to java command and not Mahout. As  result,
running "./bin/mahout -Xmx 2048M"
 shows that there is no such option. What should I do?

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message