spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yanbo Liang <>
Subject Re: SparkML algos limitations question.
Date Sun, 27 Dec 2015 10:23:23 GMT
Hi Eugene,

AFAIK, the current implementation of MultilayerPerceptronClassifier have
some scalability problems if the model is very huge (such as >10M),
although I think the limitation can cover many use cases already.


2015-12-16 6:00 GMT+08:00 Joseph Bradley <>:

> Hi Eugene,
> The maxDepth parameter exists because the implementation uses Integer node
> IDs which correspond to positions in the binary tree.  This simplified the
> implementation.  I'd like to eventually modify it to avoid depending on
> tree node IDs, but that is not yet on the roadmap.
> There is not an analogous limit for the GLMs you listed, but I'm not very
> familiar with the perceptron implementation.
> Joseph
> On Mon, Dec 14, 2015 at 10:52 AM, Eugene Morozov <
>> wrote:
>> Hello!
>> I'm currently working on POC and try to use Random Forest (classification
>> and regression). I also have to check SVM and Multiclass perceptron (other
>> algos are less important at the moment). So far I've discovered that Random
>> Forest has a limitation of maxDepth for trees and just out of curiosity I
>> wonder why such a limitation has been introduced?
>> An actual question is that I'm going to use Spark ML in production next
>> year and would like to know if there are other limitations like maxDepth in
>> RF for other algorithms: Logistic Regression, Perceptron, SVM, etc.
>> Thanks in advance for your time.
>> --
>> Be well!
>> Jean Morozov

View raw message