mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Eastman <j...@windwardsolutions.com>
Subject Re: DistributedRowMatrix to 20.2?
Date Mon, 20 Sep 2010 23:24:01 GMT
  Ok, I just wanted to check. I've been thinking of writing an in-memory 
optimization for when the operand of times() will fit. With most 
eigenvector matrices likely in that category it would save the 
CompositeInputFormat for the really big use cases. Likely will rename 
the current times() to transposeTimes() too to reduce confusion. But 
there are other things I can do so no stress.

On 9/20/10 6:40 PM, Shannon Quinn wrote:
>  Hi Jeff,
>
> I am working on it, yes; I've been experimenting with both the Hadoop 
> commit (unofficial release) that upgrades the CompositeInputFormat to 
> 0.20.2, in addition to other ways around it to avoid the problem 
> entirely (such as an adaptation of what I did in my GSoC project); 
> however, in the last week I've had to put that on hold as my PhD work 
> has really picked up :P
>
> I was going to get back to this over the weekend, so feel free to work 
> on it as you'd like and I'll just merge whatever you do into what I 
> have (I've gotten pretty good at it, considering how often 
> EigenVerificationJob gets changed :P).
>
> Shannon
>
> On 9/20/2010 12:01 PM, Jeff Eastman wrote:
>>  Hi Shannon,
>>
>> Are you working on the DRM 20.2 upgrade? I don't want to step on work 
>> you may have in mid-flight but I would like to do some stuff there 
>> myself.
>>
>> Jeff
>>
>> On 9/5/10 2:15 PM, Shannon Quinn wrote:
>>>  This was precisely the issue I ran into toward the end of my GSoC 
>>> project; there's a commit from one or two months ago to the Hadoop 
>>> mapreduce package that has a 0.20.2-compatible CompositeInputFormat, 
>>> but it's not in the official release - in my case, I wrote a small 
>>> workaround that should last until the next release, but it's not 
>>> suitable for the DistributedRowMatrix...so I'm working on something 
>>> else :) Suggestions are certainly welcome in the meantime!
>>>
>>> On 9/5/2010 1:23 PM, Jeff Eastman wrote:
>>>>  CompositeInputFormat needs to be ported or an alternative 
>>>> developed using the 20.2 API. Perhaps a good sub-project?
>>>>
>>>> On 9/4/10 3:41 PM, Jake Mannix wrote:
>>>>> +1 for attempting this, but beware: DistributedRowMatrix uses 
>>>>> map-side
>>>>> joins, and I'm not sure those are supported in the 0.20+ API.  In 
>>>>> fact, I
>>>>> have specifically ran into problems because of this when I tried 
>>>>> it in the
>>>>> past.
>>>>>
>>>>> Now, some methods can just well, get slower by doing two-pass 
>>>>> approaches
>>>>> (reduce-side join plus a second pass) to one-pass solveable 
>>>>> problems, but a
>>>>> second pass over the data is a pretty bitter pill to swallow.  
>>>>> Finding a way
>>>>> to do a map-side join in 0.20 would be nicer, if possible.
>>>>>
>>>>>    -jake
>>>>>
>>>>> On Sat, Sep 4, 2010 at 8:02 AM, Jeff 
>>>>> Eastman<jdog@windwardsolutions.com>wrote:
>>>>>
>>>>>>   +1 A user mandate, a motivated developer, perfect. You have my

>>>>>> support
>>>>>> Shannon, let me know if you run into problems.
>>>>>>
>>>>>>
>>>>>> On 9/3/10 12:17 PM, Shannon Quinn wrote:
>>>>>>
>>>>>>> Apologies for missing this; I was actually very interested in

>>>>>>> doing the
>>>>>>> DRM porting to 20.2, considering how much my GSoC project relies

>>>>>>> on it.
>>>>>>>
>>>>>>> Unless someone has already volunteered...in which case I'd love

>>>>>>> to help :)
>>>>>>>
>>>>>>> Shannon
>>>>>>>
>>>>>>> Apologies for the brevity, this was sent from my iPhone
>>>>>>>
>>>>>>> On Sep 3, 2010, at 15:11, Sebastian Schelter<ssc@apache.org>
  
>>>>>>> wrote:
>>>>>>>
>>>>>>>   I'd like to see it ported, so RowSimilarityJob can become a

>>>>>>> method of
>>>>>>>> DistributedRowMatrix.
>>>>>>>>
>>>>>>>> Am 03.09.2010 20:48, schrieb Jeff Eastman:
>>>>>>>>
>>>>>>>>> Is anybody working on this? Has anybody else looked at
it? It 
>>>>>>>>> seems
>>>>>>>>> to have a few unported dependencies like some of the
classifiers.
>>>>>>>>>
>>>>
>>>
>>>
>>
>
>


Mime
  • Unnamed multipart/mixed (inline, None, 0 bytes)
View raw message