From mahout-user-return-464-apmail-lucene-mahout-user-archive=lucene.apache.org@lucene.apache.org Mon Apr 20 12:04:12 2009 Return-Path: Delivered-To: apmail-lucene-mahout-user-archive@minotaur.apache.org Received: (qmail 21809 invoked from network); 20 Apr 2009 12:04:12 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 20 Apr 2009 12:04:12 -0000 Received: (qmail 86714 invoked by uid 500); 20 Apr 2009 12:04:11 -0000 Delivered-To: apmail-lucene-mahout-user-archive@lucene.apache.org Received: (qmail 86658 invoked by uid 500); 20 Apr 2009 12:04:11 -0000 Mailing-List: contact mahout-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mahout-user@lucene.apache.org Delivered-To: mailing list mahout-user@lucene.apache.org Received: (qmail 86648 invoked by uid 99); 20 Apr 2009 12:04:11 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 20 Apr 2009 12:04:11 +0000 X-ASF-Spam-Status: No, hits=1.2 required=10.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [208.97.132.74] (HELO spunkymail-a18.g.dreamhost.com) (208.97.132.74) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 20 Apr 2009 12:04:01 +0000 Received: from [192.168.0.102] (adsl-074-229-189-244.sip.rmo.bellsouth.net [74.229.189.244]) by spunkymail-a18.g.dreamhost.com (Postfix) with ESMTP id 1B90A5B52F for ; Mon, 20 Apr 2009 05:03:39 -0700 (PDT) Message-Id: <29FBD1E0-0AE6-4573-88B3-F57B290C9257@apache.org> From: Grant Ingersoll To: mahout-user@lucene.apache.org In-Reply-To: <120c2efce69.max7501@virgilio.it> Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (Apple Message framework v930.3) Subject: Re: Lucene search result clustering Date: Mon, 20 Apr 2009 08:03:38 -0400 References: <120c2efce69.max7501@virgilio.it> X-Mailer: Apple Mail (2.930.3) X-Virus-Checked: Checked by ClamAV on apache.org Hi Max, Great question! Wish I had a better answer, but unfortunately the step from Lucene to Mahout doesn't exist just yet. I have been slowly but surely working on integrating it into Solr (https://issues.apache.org/jira/browse/SOLR-769 will eventually have Mahout integration). I also know we have some people on this list working on it: http://www.lucidimagination.com/search/p:mahout?q=Document+clustering but it isn't where it needs to be just yet. With some luck, there will be a solution soon. Are you in the position to help? -Grant On Apr 20, 2009, at 5:51 AM, Max wrote: > Hi list, > I would like to do some Lucene Documents clustering. > I have a > Lucene index and I run my search on the index. > The search result is > composed of a list of documents. > How can I translate my list of > document in a format suitable with Mahout format? > I have seen this > library contains some clustering algorithms, but they don't provide > (at > least I haven't found) any translation from a document to a point. > Do I have to implement this by myself, or does it already exist? > Thanks > in advance. > > -------------------------- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search