mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Isabel Drost <>
Subject Re: Some questions for starters
Date Tue, 07 Oct 2008 12:19:08 GMT
On Monday 06 October 2008, werner mueller wrote:
> of course there is the option: let the user choose the limits. this has
> two drawbacks:
>  - where should the user know the limits from? and
>  - some user have to look at thousands of contracts.
> so i would prefer the system to work on its own (as much as possible).

To me your problem looks something like the following:

You have a number of clients. For each client you store the same kind of data 
sets, but the exact values differ depending on the client.

Your task is - given previous datasets - classify new incoming data as normal 
or unusual.

You already have identified several features that might help you classify the 
incoming data. The only thing you are missing is for each client a good 
combination of the features.

I see two possibilities of solving your problem:

There are algorithms for instance in the intrusion detection community that 
deal with the problem of discovering unusual data in a stream of normal data. 
You might find the algorithms you are looking for there. Maybe someone more 
familiar with this area than myself can answer your questions on this list.

The second possibility would be to manually label previous datasets as "ok" 
and "strange", train a classifier on it and apply it to new incoming data. 
Only problem here: You need labeled data for each client, you need to retrain 
each time the data changes.


Vail's Second Axiom:	The amount of work to be done increases in proportion to 
the	amount of work already completed.
  |\      _,,,---,,_       Web:   <>
  /,`.-'`'    -.  ;-;;,_
 |,4-  ) )-,_..;\ (  `'-'
'---''(_/--'  `-'\_) (fL)  IM:  <xmpp://>

View raw message