mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shannon Quinn <squ...@gatech.edu>
Subject Re: GSOC proposals and mentors [was Call to action – Mahout needs your help]
Date Thu, 04 Apr 2013 18:49:02 GMT
According to the GSoC calendar, accepted organizations aren't posted 
until April 8 (Monday), at which point (assuming Apache is accepted...I 
can't imagine it wouldn't be) slots will be doled out internally. This 
will probably take at least a day or two, so probably by middle of next 
week we'll know how many slots Mahout has.

Speaking of which: how do the various subprojects negotiate for slots? 
Is there a central spreadsheet, or an IRC meeting to attend? Or did I 
miss the email detailing this?

On 4/4/13 2:43 PM, Dan Filimon wrote:
> Any news on this front? Did we get approved/assigned a slot/anything?
>
>
> On Fri, Mar 29, 2013 at 7:44 PM, Dan Filimon <dangeorge.filimon@gmail.com>wrote:
>
>> Ok, updated!
>>
>>
>> On Fri, Mar 29, 2013 at 7:36 PM, Andy Twigg <andy.twigg@gmail.com> wrote:
>>
>>> Dan,
>>>
>>> I think what you've written is fine (I wanted to edit to remove the
>>> '?' around random forests but couldn't).
>>>
>>> ok?
>>>
>>>
>>>
>>> On 29 March 2013 11:14, Dan Filimon <dangeorge.filimon@gmail.com> wrote:
>>>> I added Andy's first suggestion and Ted's suggestion as ideas.
>>>>
>>>> Andy, could you flesh out your second suggestion into a project and
>>> make an
>>>> issue please?
>>>>
>>>>
>>>> On Fri, Mar 29, 2013 at 3:53 AM, Ted Dunning <ted.dunning@gmail.com>
>>> wrote:
>>>>> It should be possible to view a Lucene index as a matrix.  This would
>>>>> require that we standardize on a way to convert documents to rows.
>>>   There
>>>>> are many choices, the discussion of which should be deferred to the
>>> actual
>>>>> work on the project, but there are a few obvious constraints:
>>>>>
>>>>> a) it should be possible to get the same result as dumping the term
>>> vectors
>>>>> for each document each to a line and converting that result using
>>> standard
>>>>> Mahout methods.
>>>>>
>>>>> b) numeric fields ought to work somehow.
>>>>>
>>>>> c) if there are multiple text fields that ought to work sensibly as
>>> well.
>>>>>   Two options include dumping multiple matrices or to convert the fields
>>>>> into a single row of a single matrix.
>>>>>
>>>>> d) it should be possible to refer back from a row of the matrix to
>>> find the
>>>>> correct document.  THis might be because we remember the Lucene doc
>>> number
>>>>> or because a field is named as holding a unique id.
>>>>>
>>>>> e) named vectors and matrices should be used if plausible.
>>>>>
>>>>> On Thu, Mar 28, 2013 at 4:58 PM, Dan Filimon <
>>> dangeorge.filimon@gmail.com
>>>>>> wrote:
>>>>>> ...
>>>>>> Ted, could you explain a bit more what you mean by "simplify the
>>>>> connection
>>>>>> to Lucene for clustering and classification"? It's too vague for
an
>>> idea
>>>>>> proposal.
>>>>>>
>>>
>>>
>>> --
>>> Dr Andy Twigg
>>> Junior Research Fellow, St Johns College, Oxford
>>> Room 351, Department of Computer Science
>>> http://www.cs.ox.ac.uk/people/andy.twigg/
>>> andy.twigg@cs.ox.ac.uk | +447799647538
>>>
>>


Mime
View raw message