mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Reilly <adamjrei...@gmail.com>
Subject Re: 20-newsgroups example
Date Tue, 01 Sep 2009 16:33:24 GMT
One more thing...

As I mentioned before, using this method is just a temporary hack until the
correct way is documented.  You'll also have to modify the build.xml that
you copied and modify the build.xml to find the examples jar by replacing
modifying the extract-20news-18828 target like so:

<target name="extract-20news-18828" depends="check-files"
unless="reuters.extracted">
    <mkdir dir="${working.dir}/20news-18828-collapse"/>
    <java
classname="org.apache.mahout.classifier.bayes.PrepareTwentyNewsgroups"
maxmemory="1024M" fork="true">
      <!--classpath refid="maven.test.classpath"/  Commented this out and
added next lines through /path -->
      <classpath>
          <path id="lib.path.ref">
            <fileset dir="target" includes="*.jar"/>
          </path>
          <path id="lib.path.ref">
            <fileset dir="lib" includes="*.jar"/>
          </path>

      </classpath>
      <!--
      Input format is:
      inputDir outputDir label Analyzer character set
      -->
      <arg line="-p ${working.dir}/20news-18828/ -o
${working.dir}/20news-18828-collapse -a
org.apache.lucene.analysis.standard.StandardAnalyzer -c UTF-8"/>
    </java>
  </target>

Again, HTH,
Adam


On Tue, Sep 1, 2009 at 10:42 AM, Adam Reilly <adamjreilly@gmail.com> wrote:

> Gökhan,
>
> Additionally, I know that the 20newsgroups stuff is in the process of
> migrating away from the ant-based build scripts to Maven.  I'm sure there's
> a more proper 'maven-ish' way to do this, but I've found that copying the
> file located at 'maven/build.xml' to the examples directory allows you to
> use most of the data preparation (stuff in the Setup section) commands
> directly from the wiki page.  They didn't seem wot work with the
> 'build-depricated.xml' file in the examples folder from trunk.
>
> I've been working with the 20newsgroups stuff recently, thanks to a lot of
> help from Robin, so let me know if there's anything else you're having
> problems with.
>
> HTH,
> Adam
>
> On Tue, Sep 1, 2009 at 4:39 AM, Gökhan Çapan <gkhncpn@gmail.com> wrote:
>
>> Thank you, Robin. I'll try it.
>>
>> 2009/9/1 Robin Anil <robin.anil@gmail.com>
>>
>> > those docs are not modified yet. take a look at MAHOUT-124
>> > https://issues.apache.org/jira/browse/MAHOUT-124
>> > You will find the usage and the cli changes there
>> >
>> >
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message