mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xiaobo Gu <guxiaobo1...@gmail.com>
Subject Re: What about a universal input data handling mechanism for Mahout?
Date Mon, 25 Jul 2011 15:40:36 GMT
naivebayes and then bayes, thanks.

On Mon, Jul 25, 2011 at 11:37 PM, Sebastian Schelter <ssc@apache.org> wrote:
> Well there is org.apache.mahout.bayes.* and org.apache.mahout.naivebayes.*
> which one do you plan to use? I can help answering questions regarding the
> latter one as I recently refactored it.
>
> --sebastian
>
> On 25.07.2011 17:27, XiaoboGu wrote:
>>
>> Can you show me any material describing the file format requirement of
>> Naïve Bayes please.
>>
>>
>>> -----Original Message-----
>>> From: Ted Dunning [mailto:ted.dunning@gmail.com]
>>> Sent: Monday, July 25, 2011 11:16 PM
>>> To: user@mahout.apache.org
>>> Cc: dev@mahout.apache.org
>>> Subject: Re: What about a universal input data handling mechanism for
>>> Mahout?
>>>
>>> Good idea.
>>>
>>> Somebody should file a JIRA.  My guess is that the best first step would
>>> be
>>> to have the logistic regression handle the naive Bayes input format.
>>>
>>> 2011/7/25 Fernando Fernández<fernando.fernandez.gonzalez@gmail.com>
>>>
>>>> That would be very nice, actually I haven't tested most of Mahout
>>>> algorithms
>>>> for that reason...
>>>>
>>>> 2011/7/25 Xiaobo Gu<guxiaobo1982@gmail.com>
>>>>
>>>>> Hi,
>>>>> Most time Mahout algorithms use Vector as the model training input,
>>>>> but don’t take care of how the instance vectors are generated, then
>>>>> every algorithm has it’s unique way, causing the original input file
>>>>> format requirement bound to specific algorithm. That causes a lot of
>>>>> work for the actual users, especially for command line users. For
>>>>> example, if we want to build a Logistic Regression and Naïve bayes
>>>>> model for the same data, we must prepare the data in two formats.
>>>>> Hence here comes for requirement that can you provide a universal
>>>>> mechanism for handling input data, such as CSV and a CSV to Vector
>>>>> encoder, then all algorithms will use it, and users just have to
>>>>> prepare data as CSV.
>>>>>
>>>>> Regards,
>>>>>
>>>>> Xiaobo Gu
>>>>>
>>>>
>>
>
>

Mime
View raw message