mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robin Anil" <robin.a...@gmail.com>
Subject Too Many open files in BayesFileFormatter
Date Wed, 12 Nov 2008 12:11:37 GMT
I am getting this error while processing the *industry *dataset
http://www.cs.cmu.edu/~TextLearning/datasets.html

I took the leafnode classes(105) and put them in the top directory(total of
around 10K files). Then ran the following. The file close is not happening
in writeFile() . Should I file another JIRA issue for it or add it under
Mahout-60/92/93 ?



robin:~/lucene/mahout/trunk/core/work$ hadoop jar
../../examples/build/apache-mahout-examples-0.1-dev.job
org.apache.mahout.classifier.bayes.PrepareTwentyNewsgroups -p industry -o
industry-collapse -a org.apache.lucene.analysis.standard.StandardAnalyzer -c
UTF-8
java.lang.RuntimeException: java.io.FileNotFoundException:
industry/oil.and.gas.operations.industry/http_^^www.tmrc.com^ (Too many open
files)
        at
org.apache.mahout.classifier.BayesFileFormatter$FileProcessor.accept(BayesFileFormatter.java:174)
        at java.io.File.listFiles(File.java:1134)
        at
org.apache.mahout.classifier.BayesFileFormatter.collapse(BayesFileFormatter.java:75)
        at
org.apache.mahout.classifier.bayes.PrepareTwentyNewsgroups.main(PrepareTwentyNewsgroups.java:86)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
        at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
        at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
Caused by: java.io.FileNotFoundException:
industry/oil.and.gas.operations.industry/http_^^www.tmrc.com^ (Too many open
files)
        at java.io.FileInputStream.open(Native Method)
        at java.io.FileInputStream.<init>(FileInputStream.java:106)
        at
org.apache.mahout.classifier.BayesFileFormatter$FileProcessor.accept(BayesFileFormatter.java:162)
        ... 12 more

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message