mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: Mahalanobis users out there?
Date Sun, 06 Mar 2011 17:44:38 GMT
Good fellow!

I will take a quick look.

On Sun, Mar 6, 2011 at 5:15 AM, Vasil Vasilev <vavasilev@gmail.com> wrote:

> Hi Ted,
>
> The code above is an example how to use MahalanobisDistanceMeasure. About
> the problems that I came upon I created Jira and attached a patch to it:
> https://issues.apache.org/jira/browse/MAHOUT-616
>
> Regards, Vasil
>
>
> On Tue, Mar 1, 2011 at 7:26 PM, Ted Dunning <ted.dunning@gmail.com> wrote:
>
>> Vasil,
>>
>> If you are suggesting a change in Mahout, can you to to to
>> https://issues.apache.org/jira/browse/MAHOUT
>>  <https://issues.apache.org/jira/browse/MAHOUT>and file an issue with a
>> patch?
>>
>> In case the terminology is new for you, an issue is a bug report or
>> enhancement request and a patch is
>> the output of svn diff or git format-patch.
>>
>> You can get more information about this process here:
>> https://cwiki.apache.org/confluence/display/MAHOUT/How+To+Contribute
>>
>>
>> On Tue, Mar 1, 2011 at 1:11 AM, Vasil Vasilev <vavasilev@gmail.com>wrote:
>>
>>> Hi Lance,
>>>
>>> I did a small test with the Mahalanobis Distance Measure and Dirichlet
>>> clustering. Unfortunately it was not very successful at the first time,
>>> because its "configure" method was never called.
>>> I did some changes in the Mahout code to be able to run it and used the
>>> following code in the
>>> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job class:
>>>
>>> /**
>>>   * Run the job using supplied arguments, deleting the output directory
>>> if
>>> it exists beforehand
>>>   *
>>>   * @param input
>>>   *          the directory pathname for input points
>>>   * @param output
>>>   *          the directory pathname for output points
>>>   * @param modelDistribution
>>>   *          the ModelDistribution
>>>   * @param numModels
>>>   *          the number of Models
>>>   * @param maxIterations
>>>   *          the maximum number of iterations
>>>   * @param alpha0
>>>   *          the alpha0 value for the DirichletDistribution
>>>   */
>>>  public void run(Path input,
>>>                  Path output,
>>>                  ModelDistribution<VectorWritable> modelDistribution,
>>>                  int numModels,
>>>                  int maxIterations,
>>>                  double alpha0,
>>>                  boolean emitMostLikely,
>>>                  double threshold)
>>>    throws IOException, ClassNotFoundException, InstantiationException,
>>> IllegalAccessException,
>>>           SecurityException, InterruptedException {
>>>      Configuration conf = new Configuration();
>>>
>>>      if(modelDistribution instanceof DistanceMeasureClusterDistribution)
>>>        {
>>>            DistanceMeasure measure =
>>> ((DistanceMeasureClusterDistribution)modelDistribution).getMeasure();
>>>            if(measure instanceof MahalanobisDistanceMeasure)
>>>            {
>>>                Vector meanVector = new DenseVector(new double [] {0.0,
>>> 22.0, 25.0});
>>>
>>> ((MahalanobisDistanceMeasure)measure).setMeanVector(meanVector);
>>>                Matrix m= new DenseMatrix(new double [][] {{1.0, 0.0,
>>> 0.0},
>>> {0.0, 1.0, 0.0}, {0.0, 0.0, 1.0}});
>>>
>>> ((MahalanobisDistanceMeasure)measure).setCovarianceMatrix(m);
>>>
>>>                Path inverseCovarianceFile = new
>>> Path("output/MahalanobisDistanceMeasureInverseCovarianceFile");
>>>
>>>  conf.set("MahalanobisDistanceMeasure.inverseCovarianceFile",
>>> "output/MahalanobisDistanceMeasureInverseCovarianceFile");
>>>                FileSystem fs =
>>> FileSystem.get(inverseCovarianceFile.toUri(), conf);
>>>                MatrixWritable inverseCovarianceMatrix = new
>>>
>>> MatrixWritable(((MahalanobisDistanceMeasure)measure).getInverseCovarianceMatrix());
>>>                DataOutputStream out = fs.create(inverseCovarianceFile);
>>>                try {
>>>                  inverseCovarianceMatrix.write(out);
>>>                } finally {
>>>                    out.close();
>>>                }
>>>
>>>                Path meanVectorFile = new
>>> Path("output/MahalanobisDistanceMeasureMeanVectorFile");
>>>                conf.set("MahalanobisDistanceMeasure.meanVectorFile",
>>> "output/MahalanobisDistanceMeasureMeanVectorFile");
>>>                fs = FileSystem.get(meanVectorFile.toUri(), conf);
>>>                VectorWritable meanVectorWritable = new
>>> VectorWritable(meanVector);
>>>                out = fs.create(meanVectorFile);
>>>                try {
>>>                    meanVectorWritable.write(out);
>>>                } finally {
>>>                    out.close();
>>>                }
>>>
>>>                conf.set("MahalanobisDistanceMeasure.maxtrixClass",
>>> MatrixWritable.class.getName());
>>>                conf.set("MahalanobisDistanceMeasure.vectorClass",
>>> VectorWritable.class.getName());
>>>            }
>>>        }
>>>
>>>    Path directoryContainingConvertedInput = new Path(output,
>>> DIRECTORY_CONTAINING_CONVERTED_INPUT);
>>>    SynthInputDriver.runJob(input, directoryContainingConvertedInput,
>>> "org.apache.mahout.math.RandomAccessSparseVector");
>>>    //InputDriver.runJob(input, directoryContainingConvertedInput,
>>> "org.apache.mahout.math.RandomAccessSparseVector");
>>>    DirichletDriver.run(conf, directoryContainingConvertedInput,
>>>                        output,
>>>                        modelDistribution,
>>>                        numModels,
>>>                        maxIterations,
>>>                        alpha0,
>>>                        true,
>>>                        emitMostLikely,
>>>                        threshold,
>>>                        true);
>>>
>>>    try {
>>>
>>>
>>> ClusteredPointsConverter.convertClusteredPoints(directoryContainingConvertedInput,
>>> new Path(output, "clusteredPoints"),  new Path(output,
>>> "convertedClusteredPoints"),
>>> "org.apache.mahout.math.RandomAccessSparseVector");
>>>    } catch (InvocationTargetException e) {
>>>        // TODO Auto-generated catch block
>>>        e.printStackTrace();
>>>    }
>>>
>>>    // run ClusterDumper
>>>    ClusterDumper clusterDumper =
>>>        new ClusterDumper(new Path(output, "clusters-" + maxIterations),
>>> new
>>> Path(output, "convertedClusteredPoints"));
>>>    clusterDumper.printClusters(null);
>>>  }
>>>
>>> On Tue, Mar 1, 2011 at 10:12 AM, Lance Norskog <goksron@gmail.com>
>>> wrote:
>>>
>>> > Does anybody use the Mahalanobis distance measure class? If so, what
>>> for?
>>> > And how do you prepare the input matrices?
>>> >
>>> > Lance
>>> >
>>>
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message