According to the GSoC calendar, accepted organizations aren't posted
until April 8 (Monday), at which point (assuming Apache is accepted...I
can't imagine it wouldn't be) slots will be doled out internally. This
will probably take at least a day or two, so probably by middle of next
week we'll know how many slots Mahout has.
Speaking of which: how do the various subprojects negotiate for slots?
Is there a central spreadsheet, or an IRC meeting to attend? Or did I
miss the email detailing this?
On 4/4/13 2:43 PM, Dan Filimon wrote:
> Any news on this front? Did we get approved/assigned a slot/anything?
>
>
> On Fri, Mar 29, 2013 at 7:44 PM, Dan Filimon <dangeorge.filimon@gmail.com>wrote:
>
>> Ok, updated!
>>
>>
>> On Fri, Mar 29, 2013 at 7:36 PM, Andy Twigg <andy.twigg@gmail.com> wrote:
>>
>>> Dan,
>>>
>>> I think what you've written is fine (I wanted to edit to remove the
>>> '?' around random forests but couldn't).
>>>
>>> ok?
>>>
>>>
>>>
>>> On 29 March 2013 11:14, Dan Filimon <dangeorge.filimon@gmail.com> wrote:
>>>> I added Andy's first suggestion and Ted's suggestion as ideas.
>>>>
>>>> Andy, could you flesh out your second suggestion into a project and
>>> make an
>>>> issue please?
>>>>
>>>>
>>>> On Fri, Mar 29, 2013 at 3:53 AM, Ted Dunning <ted.dunning@gmail.com>
>>> wrote:
>>>>> It should be possible to view a Lucene index as a matrix. This would
>>>>> require that we standardize on a way to convert documents to rows.
>>> There
>>>>> are many choices, the discussion of which should be deferred to the
>>> actual
>>>>> work on the project, but there are a few obvious constraints:
>>>>>
>>>>> a) it should be possible to get the same result as dumping the term
>>> vectors
>>>>> for each document each to a line and converting that result using
>>> standard
>>>>> Mahout methods.
>>>>>
>>>>> b) numeric fields ought to work somehow.
>>>>>
>>>>> c) if there are multiple text fields that ought to work sensibly as
>>> well.
>>>>> Two options include dumping multiple matrices or to convert the fields
>>>>> into a single row of a single matrix.
>>>>>
>>>>> d) it should be possible to refer back from a row of the matrix to
>>> find the
>>>>> correct document. THis might be because we remember the Lucene doc
>>> number
>>>>> or because a field is named as holding a unique id.
>>>>>
>>>>> e) named vectors and matrices should be used if plausible.
>>>>>
>>>>> On Thu, Mar 28, 2013 at 4:58 PM, Dan Filimon <
>>> dangeorge.filimon@gmail.com
>>>>>> wrote:
>>>>>> ...
>>>>>> Ted, could you explain a bit more what you mean by "simplify the
>>>>> connection
>>>>>> to Lucene for clustering and classification"? It's too vague for
an
>>> idea
>>>>>> proposal.
>>>>>>
>>>
>>>
>>> --
>>> Dr Andy Twigg
>>> Junior Research Fellow, St Johns College, Oxford
>>> Room 351, Department of Computer Science
>>> http://www.cs.ox.ac.uk/people/andy.twigg/
>>> andy.twigg@cs.ox.ac.uk | +447799647538
>>>
>>
|