mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Suneel Marthi <suneel_mar...@yahoo.com>
Subject Re: seqdirectory -filter arg: not found, default used, no exception
Date Mon, 26 Aug 2013 17:58:27 GMT
Thanks for confirming that. Could u file a JIRA to fix the MR version of this?

Thanks again.




________________________________
 From: Liz Merkhofer <lmerkhofer@bericotechnologies.com>
To: user@mahout.apache.org; Suneel Marthi <suneel_marthi@yahoo.com> 
Sent: Monday, August 26, 2013 1:56 PM
Subject: Re: seqdirectory -filter arg: not found, default used, no exception
 

You're exactly right... with the sequential flag, my filter is found. An
exception is thrown, but for now the problem seems to be the json-reading
filter itself and not Mahout. Thanks!

For completeness, the command is now:
mahout seqdirectory -o test_json -i json_stems.json -filter
MahoutFilter.JsonFilter -ow -xm sequential

And the stacktrace, apparently caused by problems in my filter is:
Exception in thread "main" java.lang.IllegalStateException:
java.lang.NoSuchMethodException:
MahoutFilter.JsonFilter.<init>(org.apache.hadoop.conf.Configuration,
java.lang.String, java.util.Map, org.apache.mahout.utils.io.ChunkedWriter,
java.nio.charset.Charset, org.apache.hadoop.fs.FileSystem)
at org.apache.mahout.common.ClassUtils.instantiateAs(ClassUtils.java:53)
at org.apache.mahout.common.ClassUtils.instantiateAs(ClassUtils.java:36)
at
org.apache.mahout.text.SequenceFilesFromDirectory.runSequential(SequenceFilesFromDirectory.java:109)
at
org.apache.mahout.text.SequenceFilesFromDirectory.run(SequenceFilesFromDirectory.java:87)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at
org.apache.mahout.text.SequenceFilesFromDirectory.main(SequenceFilesFromDirectory.java:63)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:194)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: java.lang.NoSuchMethodException:
MahoutFilter.JsonFilter.<init>(org.apache.hadoop.conf.Configuration,
java.lang.String, java.util.Map, org.apache.mahout.utils.io.ChunkedWriter,
java.nio.charset.Charset, org.apache.hadoop.fs.FileSystem)
at java.lang.Class.getConstructor0(Class.java:2754)
at java.lang.Class.getConstructor(Class.java:1684)
at org.apache.mahout.common.ClassUtils.instantiateAs(ClassUtils.java:47)
... 18 more


On Mon, Aug 26, 2013 at 1:35 PM, Suneel Marthi <suneel_marthi@yahoo.com>wrote:

> Seems like a bug in the MR version of seqdirectory. (I am assuming u r
> working off of trunk or Mahout 0.8)
>
> Could you try running this again by specifying the '-xm sequential' option
> and check if the behavior is correct?
>
>
>
>
> ________________________________
>  From: Liz Merkhofer <lmerkhofer@bericotechnologies.com>
> To: user@mahout.apache.org
> Sent: Monday, August 26, 2013 1:19 PM
> Subject: seqdirectory -filter arg: not found, default used, no exception
>
>
> Hello list,
>
> I'm trying to inject my own filter into "seqdirectory" so I can use a .json
> file in the format {"docid": "text", } as input. I understand that a custom
> filter can be specified as -filter, replacing the default
> PrefixAdditionFilter.
>
> However, when I put what I thought was a json-reading filter in the
> dependancies as MahoutFilter.JsonFilter, it read the whole json file up
> with the file's path as the key and the whole json file as the value - that
> is, exactly as if the default filter were working.
>
> Command for that: mahout seqdirectory -o test_json -i json_stems.json
> -filter MahoutFilter.JsonFilter -ow
>
> (MahoutFilter.JsonFilter is the whole classpath.)
>
> Then I tried putting my a filter name in there that definitely didn't
> exist:
>
> mahout seqdirectory -o test_json -i json_stems.json -filter NoSuchFilter
> -ow
>
> Once again, no exception thrown, and the default filter seems to have been
> used. Still, it does recognize that it was given the argument:
> Command line arguments: {--charset=[UTF-8], --chunkSize=[64], --endPhase=
> [2147483647], --fileFilterClass=[NoSuchFilter], --input=[json_stems.json],
> --keyPrefix=[], --method=[mapreduce], --output=[test_json],
> --overwrite=null, --startPhase=[0], --tempDir=[temp]}
>
> My take-away from this is:
>
> 1. When mahout does not find the filter specified, it uses the default.
> Minimally, a user should be warned when their argument is ignored. Perhaps
> I should document this in the jira.
>
> 2. Any ideas on helping mahout find my filter?
>
> 3. There was a csv filter up to 0.5 that also would have done the trick
> here - any reason it's no longer included?
>
> Thanks,
> Liz
>
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message