mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Boris Fersing <bo...@fersing.eu>
Subject Re: Updating a classifier model on the fly
Date Tue, 06 Mar 2012 16:32:08 GMT
Thanks Charles, I'll have a look at it.

cheers,
Boris

On Tue, Mar 6, 2012 at 11:25, Charles Earl <charlescearl@me.com> wrote:
> Boris,
> Have you looked at online decision trees and the ilke
> http://www.cs.washington.edu/homes/pedrod/papers/kdd01b.pdf
> I think ultimately the concept boils down to Temese's observation of their being some
measure (in the paper's case, concept drift)
> that triggers re-training of the entire set.
> C
> On Mar 6, 2012, at 11:17 AM, Boris Fersing wrote:
>
>> Hi Temese,
>>
>> thank you very much for this information.
>>
>> Boris
>>
>> On Tue, Mar 6, 2012 at 11:14, Temese Szalai <temeseszalai@gmail.com> wrote:
>>> Hi Boris -
>>>
>>> Unless Mahout has super-powers that I am not aware of, years of experience
>>> in text classification tell me that - yes, you will have to rebuild the
>>> classifier model regularly as new labeled data becomes available.
>>>
>>> If you are building a system that incorporates a user feedback loop as it
>>> sounds like you are (i.e., "yes, this message is spam"), one thing that
>>> might reduce the amount of classifier re-training would be to verify that
>>> the
>>> new incoming labeled document is not already in your data set, i.e., not a
>>> dupe. Additionally, you probably want to wait to retrain until you have
>>> some critical mass of newly labeled documents or else you have a critical
>>> data point to include.
>>>
>>> If someone has the ability to say "no this is not spam", keeping that data
>>> as labeled data to add to your anti-content/negative content set would be
>>> valuable.
>>> Best,
>>> Temese
>>>
>>> On Tue, Mar 6, 2012 at 7:48 AM, Boris Fersing <boris@fersing.eu> wrote:
>>>
>>>> Hi all,
>>>>
>>>> is there a way to update a classifier model on the fly? Or do I need
>>>> to recompute everything each time I add a document to a category in
>>>> the training set?
>>>>
>>>> I would like to build something similar to some spam filters, where
>>>> you can confirm that a message is a spam or not, and thus, train the
>>>> classifier.
>>>>
>>>> regards,
>>>> Boris
>>>> --
>>>> 42
>>>>
>>
>>
>>
>> --
>> 42
>



-- 
42

Mime
View raw message