Return-Path: Delivered-To: apmail-jakarta-commons-dev-archive@www.apache.org Received: (qmail 80099 invoked from network); 3 Jan 2006 03:20:49 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 3 Jan 2006 03:20:49 -0000 Received: (qmail 17649 invoked by uid 500); 3 Jan 2006 03:20:45 -0000 Delivered-To: apmail-jakarta-commons-dev-archive@jakarta.apache.org Received: (qmail 17590 invoked by uid 500); 3 Jan 2006 03:20:45 -0000 Mailing-List: contact commons-dev-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Help: List-Post: List-Id: "Jakarta Commons Developers List" Reply-To: "Jakarta Commons Developers List" Delivered-To: mailing list commons-dev@jakarta.apache.org Received: (qmail 17579 invoked by uid 99); 3 Jan 2006 03:20:45 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 02 Jan 2006 19:20:45 -0800 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=HTML_MESSAGE X-Spam-Check-By: apache.org Received-SPF: neutral (asf.osuosl.org: local policy) Received: from [140.247.115.244] (HELO mail-1.hmdc.harvard.edu) (140.247.115.244) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 02 Jan 2006 19:20:44 -0800 Received: from [192.168.1.6] (207-172-79-215.c3-0.smr-ubr1.sbo-smr.ma.cable.rcn.com [::ffff:207.172.79.215]) (AUTH: LOGIN mdiggory, SSL: TLSv1/SSLv3,256bits,AES256-SHA) by mail-1.hmdc.harvard.edu with esmtp; Mon, 02 Jan 2006 22:20:21 -0500 id 0000000000208067.0000000043B9ED75.0000184C Message-ID: <43B9ED74.5080603@apache.org> Date: Mon, 02 Jan 2006 22:20:20 -0500 From: Mark Diggory User-Agent: Mozilla Thunderbird 1.0.6 (Windows/20050716) X-Accept-Language: en-us, en MIME-Version: 1.0 To: Phil Steitz CC: Jakarta Commons Developers List Subject: Re: [math] JSR 247: Data Mining 2.0 References: <8a81b4af0601011257q3e4e2170ub85560164e60c1b3@mail.gmail.com> <43B97298.7050203@apache.org> <8a81b4af0601021553r1724f549oa0bb9f5cc7460675@mail.gmail.com> In-Reply-To: <8a81b4af0601021553r1724f549oa0bb9f5cc7460675@mail.gmail.com> Content-Type: multipart/alternative; boundary="------------060301020707030103030703" X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N --------------060301020707030103030703 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Phil Steitz wrote: >On 1/2/06, Mark Diggory wrote: > > >>Phil, >> >>This is a great idea as a specification and standard. We currently have >>a service in our project which does something similar, but its mostly >>implemented in Perl and R. >> >> > >What project would that be? > > My primary employment at the moment at Harvard; The Virtual Data Center project [http://www.thedata.org][http://www.sourceforge.net/projects/thedata] >>I wonder though, how much of it would be implemented at that database >>level vs. in the application. For instance, in doing a transform that >>returned a subset of a dataset from a db, it would much more efficient >>to do it at the db level (in the query) than in the application itself. >> >> > >The spec being developed is focussed on the analytical / statistical >side rather than OLAP and also aims to be implementation-independent >(i.e., what is really being standardized is the API for vendors to >implement and client apps to use). That said, your point is valid - >it may be difficult to optimize implementation of some functions when >the db engine can / should do much of the work natively. > > > >>But I like as well the idea of a standalone java based implementation >>too (maybe on HSQLDB) or perhaps theres a direction that could be taken >>with Hibernate as well. >> >> >> >As noted above, the functional areas being considered are more >analytical - regression, clustering, classification, feature >extraction, etc. The overlap with [math] is in the statistical stuff. > >Phil > > Very true, we can explore implementations of the algorithms, I'm sure they would be useful the stat library. I point out HSQLDB because it has the capability to call java functions directly and use them in stored procedures etc. See: http://hsqldb.org/doc/guide/ch09.html#stored-section I could see the placement of Commons Math libraries within this situation be very effective if done right. Though in HSQLDB I'm still learning if the same can be done with updating aggregate functions the way one can with static methods. -Mark --------------060301020707030103030703--