mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From praveenesh kumar <praveen...@gmail.com>
Subject Re: How to run Mahout java code from commandline ?
Date Sat, 24 Sep 2011 09:31:29 GMT
I mean to say..

I have this code ..

 import java.io.File;
 import java.io.IOException;
 import java.nio.charset.Charset;
 import java.util.ArrayList;
 import java.util.Arrays;
 import java.util.Collection;
 import java.util.HashSet;
 import java.util.Map;
 import java.util.Set;
 import java.util.List;

 import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.fs.FileSystem;
 import org.apache.hadoop.fs.Path;
 import org.apache.hadoop.io.SequenceFile;
 import org.apache.hadoop.io.Text;
 //import org.apache.lucene.util.Attribute;
 import org.apache.mahout.common.FileLineIterable;
 import org.apache.mahout.common.StringRecordIterator;

 import org.apache.mahout.fpm.pfpgrowth.convertors.ContextStatusUpdater;
 import
org.apache.mahout.fpm.pfpgrowth.convertors.SequenceFileOutputCollector;
 import
org.apache.mahout.fpm.pfpgrowth.convertors.string.StringOutputConverter;



 import
org.apache.mahout.fpm.pfpgrowth.convertors.string.TopKStringPatterns;
 import org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth;
 //import org.apache.mahout.math.map.OpenLongObjectHashMap;

 import org.apache.mahout.common.Pair;

 public class DellFPGrowth {

    public static void main(String[] args) throws IOException {

        Set<String> features = new HashSet<String>();
        String input =
"/mnt/hgfs/Hadoop-automation/new-delltransaction.txt";
        int minSupport = 1;
        int maxHeapSize = 50;//top-k
        String pattern = " \"[ ,\\t]*[,|\\t][ ,\\t]*\" ";
        Charset encoding = Charset.forName("UTF-8");
        FPGrowth<String> fp = new FPGrowth<String>();
        String output = "/tmp/output.txt";
        Path path = new Path(output);
        Configuration conf = new Configuration();
        FileSystem fs = FileSystem.get(conf);


        SequenceFile.Writer writer = new SequenceFile.Writer(fs, conf, path,
Text.class, TopKStringPatterns.class);


fp.generateTopKFrequentPatterns(
                new StringRecordIterator(new FileLineIterable(new
File(input), encoding, false), pattern),
                fp.generateFList(
                    new StringRecordIterator(new FileLineIterable(new
File(input), encoding, false), pattern),
                    minSupport),
                minSupport,
                maxHeapSize,
                features,
                new StringOutputConverter(new
SequenceFileOutputCollector<Text,TopKStringPatterns>(writer)),
                new ContextStatusUpdater(null));

        writer.close();

        List<Pair<String,TopKStringPatterns>> frequentPatterns =
FPGrowth.readFrequentPattern(fs, conf, path);
        for (Pair<String,TopKStringPatterns> entry : frequentPatterns) {
              System.out.println(entry.getSecond());
        }
        System.out.print("\nthe end! ");
    }

}


How should I compile and run using command line..
I don't have eclipse on my system. How can I run this code  ?

Thanks,
Praveenesh

On Sat, Sep 24, 2011 at 12:40 PM, Danny Bickson <danny.bickson@gmail.com>wrote:

> It is very simple: in the root folder you run (for example for k-means:)
> ./bin/mahout kmeans -i ~/usr7/small_netflix_mahout/ -o
> ~/usr7/small_netflix_mahout_output/ --numClusters
> 10 -c ~/usr7/small_netflix_mahout/ -x 10
>
> where ./bin/mahout is used for any mahout application, and the next keyword
> (kmeans in this case) defines the algorithm type.
> The rest of the inputs are algorithm specific.
>
> If you want to add a new application to the existing ones, you need to edit
> conf/driver.classes.props
> file and point into your main class.
>
> Best,
>
> - Danny Bickson
>
> On Sat, Sep 24, 2011 at 9:59 AM, praveenesh kumar <praveenesh@gmail.com
> >wrote:
>
> > Hey,
> > I have this code written using mahout libraries. I am able to run the
> code
> > from eclipse
> > How can I run the code written in mahout from command line ?
> >
> > My question is do I have to make a jar file and run it as hadoop jar
> > jarfilename.jar class
> > or shall I run it using simple java command ?
> >
> > Can anyone solve my confusion ?
> > I am not able to run this code.
> >
> > Thanks,
> > Praveenesh
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message