Thank you for getting back.
I will post the idea on arxiv.
On Sat, Jun 29, 2013 at 10:58 PM, Ted Dunning <ted.dunning@gmail.com> wrote:
> If practical, this could be very handy.
>
> For reference, penalized linear regression can be used to solve
> compressive sensing problems. It can also be used to accurately reverse
> engineer hashed vector representations.
>
>
>
>
> On Sat, Jun 29, 2013 at 10:27 PM, Timothy Mann <mann.timothy@gmail.com>wrote:
>
>> Hi Michael,
>>
>> Your approach sounds useful and (in my opinion) fills an important gap in
>> existing OSS machine learning libraries. I for one, would be interested in
>> an efficient, parallel implementation of regularized regression. I'm not a
>> contributor to Mahout, but the usual questions when someone wants to
>> contribute an implemented algorithm seem to be:
>>
>> 1. Will you be willing and able (or know of someone who is willing and
>> able) to maintain the code once it is integrated with Mahout? (mahout
>> developers currently seem to be stretched a bit thin)
>> 2. What is the state of the code? Is it already integrated with Mahout?
>> What libraries does it depend on? Does it conform (or can it be fit)
>> nicely
>> to Mahout interfaces? How much work will it be (approximately)?
>> 3. How has your implementation been tested? Do you know of a dataset that
>> can be used for unit testing the framework? Is there a particular use case
>> that is driving your implementation and development of this algorithm?
>>
>>
>> King Tim
>>
>>
>> On Jun 30, 2013 1:15 AM, "Michael Kun Yang" <kunyang@stanford.edu> wrote:
>>
>> > Hello,
>> >
>> > I recently implemented a single pass algorithm for penalized linear
>> > regression with cross validation in a big data startup. I'd like to
>> > contribute this to Mahout.
>> >
>> > Penalized linear regression such as Lasso, Elasticnet are widely used
>> in
>> > machine learning, but there are no very efficient scalable
>> implementations
>> > on MapReduce.
>> >
>> > The published distributed algorithms for solving this problem is either
>> > iterative (which is not good for MapReduce, see Steven Boyd's paper) or
>> > approximate (what if we need exact solutions, see Paralleled stochastic
>> > gradient descent); another disadvantage of these algorithms is they can
>> not
>> > do cross validation in the training phase, the user must provide a
>> penalty
>> > parameter in advance.
>> >
>> > My ideas can train the model with cross validation in a simple pass.
>> They
>> > are based on some simple observations. I will post them on Arxiv then
>> share
>> > the link in the followup email.
>> >
>> > Any feedback would be helpful.
>> >
>> > Thanks
>> > Michael
>> >
>>
>
>
