mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Kun Yang <kuny...@stanford.edu>
Subject Re: single-pass algorithm for penalized linear regression with cross validation
Date Sun, 30 Jun 2013 19:38:01 GMT
Thank you for getting back.

I will post the idea on arxiv.


On Sat, Jun 29, 2013 at 10:58 PM, Ted Dunning <ted.dunning@gmail.com> wrote:

> If practical, this could be very handy.
>
> For reference, penalized linear regression can be used to solve
> compressive sensing problems.  It can also be used to accurately reverse
> engineer hashed vector representations.
>
>
>
>
> On Sat, Jun 29, 2013 at 10:27 PM, Timothy Mann <mann.timothy@gmail.com>wrote:
>
>> Hi Michael,
>>
>> Your approach sounds useful and (in my opinion) fills an important gap in
>> existing OSS machine learning libraries. I for one, would be interested in
>> an efficient, parallel implementation of regularized regression. I'm not a
>> contributor to Mahout, but the usual questions when someone wants to
>> contribute an implemented algorithm seem to be:
>>
>> 1. Will you be willing and able (or know of someone who is willing and
>> able) to maintain the code once it is integrated with Mahout? (mahout
>> developers currently seem to be stretched a bit thin)
>> 2. What is the state of the code? Is it already integrated with Mahout?
>> What libraries does it depend on? Does it conform (or can it be fit)
>> nicely
>> to Mahout interfaces? How much work will it be (approximately)?
>> 3. How has your implementation been tested? Do you know of a dataset that
>> can be used for unit testing the framework? Is there a particular use case
>> that is driving your implementation and development of this algorithm?
>>
>>
>> -King Tim
>>
>>
>> On Jun 30, 2013 1:15 AM, "Michael Kun Yang" <kunyang@stanford.edu> wrote:
>>
>> > Hello,
>> >
>> > I recently implemented a single pass algorithm for penalized linear
>> > regression with cross validation in a big data start-up. I'd like to
>> > contribute this to Mahout.
>> >
>> > Penalized linear regression such as Lasso, Elastic-net are widely used
>> in
>> > machine learning, but there are no very efficient scalable
>> implementations
>> > on MapReduce.
>> >
>> > The published distributed algorithms for solving this problem is either
>> > iterative (which is not good for MapReduce, see Steven Boyd's paper) or
>> > approximate (what if we need exact solutions, see Paralleled stochastic
>> > gradient descent); another disadvantage of these algorithms is they can
>> not
>> > do cross validation in the training phase, the user must provide a
>> penalty
>> > parameter in advance.
>> >
>> > My ideas can train the model with cross validation in a simple pass.
>> They
>> > are based on some simple observations. I will post them on Arxiv then
>> share
>> > the link in the follow-up email.
>> >
>> > Any feedback would be helpful.
>> >
>> > Thanks
>> > -Michael
>> >
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message