commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Diggory <mdigg...@apache.org>
Subject Re: [math] JSR 247: Data Mining 2.0
Date Tue, 03 Jan 2006 03:20:20 GMT
Phil Steitz wrote:

>On 1/2/06, Mark Diggory <mdiggory@apache.org> wrote:
>  
>
>>Phil,
>>
>>This is a great idea as a specification and standard. We currently have
>>a service in our project which does something similar, but its mostly
>>implemented in Perl and R.
>>    
>>
>
>What project would that be?
>  
>
My primary employment at the moment at Harvard; The Virtual Data Center 
project 
[http://www.thedata.org][http://www.sourceforge.net/projects/thedata]

>>I wonder though, how much of it would be implemented at that database
>>level vs. in the application. For instance, in doing a transform that
>>returned a subset of a dataset from a db, it would much more efficient
>>to do it at the db level (in the query) than in the application itself.
>>    
>>
>
>The spec being developed is focussed on the analytical / statistical
>side rather than OLAP and also aims to be implementation-independent
>(i.e., what is really being standardized is the API for vendors to
>implement and client apps to use).  That said, your point is valid -
>it may be difficult to optimize implementation of some functions when
>the db engine can / should do much of the work natively.
>
>  
>
>>But I like as well the idea of a standalone java based implementation
>>too (maybe on HSQLDB) or perhaps theres a direction that could be taken
>>with Hibernate as well.
>>
>>    
>>
>As noted above, the functional areas being considered are more
>analytical - regression, clustering, classification, feature
>extraction, etc.  The overlap with [math] is in the statistical stuff.
>
>Phil
>  
>
Very true, we can explore implementations of the algorithms, I'm sure 
they would be useful the stat library. I point out HSQLDB because it has 
the capability to call java functions directly and use them in stored 
procedures etc. See:

http://hsqldb.org/doc/guide/ch09.html#stored-section

I could see the placement of Commons Math libraries within this 
situation be very effective if done right. Though in HSQLDB I'm still 
learning if the same can be done with updating aggregate functions the 
way one can with static methods.

-Mark

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message