mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Palumbo <ap....@outlook.com>
Subject Re: dependency-reduced jar
Date Sat, 02 May 2015 21:52:03 GMT
Right now com.tdunning.math.stats.TDigest is being used by 
OnlineSummarizer in mahout-math. And OnlineSummarizer is used in the 
ResultAnalyzer port in math-scala. I guess the thing to do would be to 
port OnlineSummarizer to math-scala and use the Streamlib 
com.clearspring.analytics.stream.quantile.TDigest. From a quick look it 
should be trivial.  That way we could leave it out of the assembly.

As is ResultAnalyzer is currently used in the front-end only.

commons.math3 might be slightly more complicated to get rid of.


On 05/01/2015 02:37 PM, Dmitriy Lyubimov wrote:
> I'd rather switch to using stream-lib for t-digest. It is much more widely
> adopted distribution of that and is already part of spark dependencies, so
> in case of spark job, it doesn't need to be packaged explicitly in the
> backend classpath.
>
> although depending on backend transitive jars may be a dangerous practice,
> as we saw in case of guava. Nonetheless, for the sake of standardizing
> things, i'd rather be depended on stream-lib than on a single-algorithm jar
> with unclear support commitment.
>
>
> On Fri, May 1, 2015 at 10:01 AM, Andrew Palumbo <ap.dev@outlook.com> wrote:
>
>> ResultAnalyzer is Also used in SparkNaiveBayes.test (...).
>>
>>
>> Sent from my Verizon Wireless 4G LTE smartphone
>>
>> <div>-------- Original message --------</div><div>From: Andrew
Palumbo <
>> ap.dev@outlook.com> </div><div>Date:05/01/2015  12:57 PM  (GMT-05:00)
>> </div><div>To: dev@mahout.apache.org </div><div>Subject:
RE:
>> dependency-reduced jar </div><div>
>> </div>
>>
>> I added T-digest and math3. the CLI Naive Bayes driver needs them.
>> Specifically the ResultAnalyzer in TestNBDriver.
>>
>>
>> Sent from my Verizon Wireless 4G LTE smartphone
>>
>> <div>-------- Original message --------</div><div>From: Suneel
Marthi <
>> suneel.marthi@gmail.com> </div><div>Date:05/01/2015  12:14 PM
>> (GMT-05:00) </div><div>To: mahout <dev@mahout.apache.org>
>> </div><div>Subject: Re: dependency-reduced jar </div><div>
>> </div>T-digest is being used in Mahout-MR, I believe its also packaged as
>> part of
>> Spark -> AddThis jar.
>>
>> On Fri, May 1, 2015 at 12:11 PM, Pat Ferrel <pat@occamsmachete.com> wrote:
>>
>>> There is an assembly xml in
>>> mahout/spark/src/main/assembly/dependency-reduced.xml. It contains
>>> dependencies that are external to mahout but required for either the
>> client
>>> or backend executor distributed code.
>>>
>>> Guava has recently been removed but scopt is still used by the client.
>> For
>>> some reason the following artifacts were added to the assembly and I’m
>> not
>>> sure why. This is only used with Spark.
>>>
>>>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message