# mahout-dev mailing list archives

##### Site index · List index
Message view
Top
From peng <pc...@uowmail.edu.au>
Subject Re: Any idea which approaches to non-liniear svm are easily parallelizable?
Date Thu, 23 Oct 2014 05:58:32 GMT
Yes I am, In fact, my question is just about whether approximation is
used to make the total workload of computing the matrix sub-quadratic to
the training set size.

On 10/22/2014 02:21 PM, Andrew Palumbo wrote:
> Peng, I'm not sure if you were referring to what I wrote:
>
>>>>       1.    The Kernel projections
> if so-  I was talking about parallelizing the computation of eg. the RBF Kernals
>
>> Date: Wed, 22 Oct 2014 13:42:45 -0400
>> From: pc175@uowmail.edu.au
>> To: dev@mahout.apache.org
>> CC: dlieu.7@gmail.com
>> Subject: Re: Any idea which approaches to non-liniear svm are easily parallelizable?
>>
>> Is the kernel projection referring to online/incremental incomplete
>> Cholesky decomposition? Sorry I haven't used SVM for a long time and
>> didn't keep up with SotA.
>>
>> If that's true, I haven't find an out-of-the-box implementation, but
>> this should be easy.
>>
>> Yours Peng
>>
>> On 10/22/2014 01:32 PM, Dmitriy Lyubimov wrote:
>>> Andrew, thanks a bunch for the pointers!
>>>
>>>
>>>
>>> On Wed, Oct 22, 2014 at 10:14 AM, Andrew Palumbo <ap.dev@outlook.com> wrote:
>>>
>>>> If you do want to stick with SVM-
>>>>
>>>> This is a question that I keep coming back myself to and unfortunately
>>>> have forgotten more (and lost more literature) than I’ve retained.
>>>> I believe that the most easily parallelizable sections of libSVM for small
>>>> datasets are (for C-SVC(R), RBF and polynomial kernels):
>>>>
>>>>       2.    The Hyper-Paramater grid search for C,  \gamma (I believe this
>>>> is now included in LibSVM- I havent looked at it in a while)
>>>>       3.    For multi-class SVC:  the concurrent computation of each SVM
for
>>>> each one-against-one class vote.
>>>>
>>>> I’m unfamiliar with any easily parallizable method for QP itself.
>>>> Unfortunately for (2), (3) this involves broadcasting the entire dataset
>>>> out to each node of a cluster (or working in a shared memory environment)
>>>> so may not be practical depending on the size of your data set.  I’ve only
>>>> ever implemented (2) for relatively small datasets using MPI and and a with
>>>> pure java socket implementation.
>>>>
>>>> Other approaches (further from simple LibSVM), which are more applicable
>>>> to large datasets (I’m less familiar with these):
>>>>
>>>>       4.    Divide and conquer the QP/SMO problem and solve (As I’ve said,
>>>> I’m unfamiliar with this and  I don’t know of any standard)
>>>>
>>>>       5.    Break the training set into subsets and solve.
>>>>
>>>> For (5) there are several approaches, two that I know of are ensemble
>>>> approaches and those that accumulate Support Vectors from each partition
>>>> and heuristically keep/reject them until the model converges.  As well I’ve
>>>> read some just read some research on implementing this in map a MapReduce
>>>> style[2].
>>>>
>>>> I came across this paper [1] last night which you may find interesting as
>>>> well which is an interesting comparison of some SVM parallelization
>>>> strategies, particularly it discusses (1) for a shared memory environment
>>>> and for offloading work to GPUs (using OpenMP  and CUDA). It also cites
>>>> several other nice papers discussing SVM parallelization strategies
>>>> especially for (5).  Also then goes on to discuss more purely linear
>>>> algebra approach to optimizing SVMs (sec. 5)
>>>>
>>>> Also regarding (5) you may be interested in [2] (something I’ve only
>>>> looked over briefly).
>>>>
>>>> [1] http://arxiv.org/pdf/1404.1066v1.pdf
>>>> [2] http://arxiv.org/pdf/1301.0082.pdf
>>>>
>>>>
>>>>> From: ted.dunning@gmail.com
>>>>> Date: Tue, 21 Oct 2014 17:32:22 -0700
>>>>> Subject: Re: Any idea which approaches to non-liniear svm are easily
>>>> parallelizable?
>>>>> To: dev@mahout.apache.org
>>>>>
>>>>> Last I heard, the best methods pre-project and do linear SVM.
>>>>>
>>>>> Beyond that, I would guess that deep learning techniques would subsume
>>>>> non-linear SVM pretty easily.  The best parallel implementation I know
>>>> for
>>>>> that is in H2O.
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Oct 21, 2014 at 4:12 PM, Dmitriy Lyubimov <dlieu.7@gmail.com>
>>>> wrote:
>>>>>> in particular, from libSVM --
>>>>>> http://www.csie.ntu.edu.tw/~cjlin/papers/libsvm.pdf ?
>>>>>>
>>>>>> thanks.
>>>>>> -d
>>>>>>
>


Mime
View raw message