Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@www.apache.org Received: (qmail 53509 invoked from network); 11 Nov 2003 20:30:56 -0000 Received: from daedalus.apache.org (HELO mail.apache.org) (208.185.179.12) by minotaur-2.apache.org with SMTP; 11 Nov 2003 20:30:56 -0000 Received: (qmail 36864 invoked by uid 500); 11 Nov 2003 20:30:40 -0000 Delivered-To: apmail-jakarta-lucene-user-archive@jakarta.apache.org Received: (qmail 36830 invoked by uid 500); 11 Nov 2003 20:30:39 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 36817 invoked from network); 11 Nov 2003 20:30:39 -0000 Received: from unknown (HELO smtp-out.quicknet.nl) (213.73.255.38) by daedalus.apache.org with SMTP; 11 Nov 2003 20:30:39 -0000 Received: from vmx10.multikabel.net (vmx10.multikabel.net [212.127.254.136]) by mta1.priv.quicknet.nl (iPlanet Messaging Server 5.2 HotFix 1.21 (built Sep 8 2003)) with ESMTP id <0HO700BTIFLB1D@mta1.priv.quicknet.nl> for lucene-user@jakarta.apache.org; Tue, 11 Nov 2003 21:29:35 +0100 (MET) Received: from whale (qn-213-73-232-226.quicknet.nl [213.73.232.226]) by vmx10.multikabel.net (8.12.8/8.12.8) with SMTP id hABKTVPT031030; Tue, 11 Nov 2003 21:29:31 +0100 Date: Tue, 11 Nov 2003 21:32:25 +0100 From: maurits van wijland Subject: Re: Document Clustering To: Lucene Users List , marc@bioseeker.bioinfocg.com Message-id: <013301c3a892$eb31b1e0$0200a8c0@whale> MIME-version: 1.0 X-MIMEOLE: Produced By Microsoft MimeOLE V6.00.2800.1165 X-Mailer: Microsoft Outlook Express 6.00.2800.1158 Content-type: text/plain; charset=iso-8859-1 Content-transfer-encoding: 7BIT X-Priority: 3 X-MSMail-priority: Normal X-MultiKabel-MailScanner-Information: Please contact helpdesk@quicknet.nl for more information X-MultiKabel-MailScanner: Found to be clean References: <002601c3a86f$11e21010$0400a8c0@betty> X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N Hi All and Marc, There is the carrot project : http://www.cs.put.poznan.pl/dweiss/carrot/ The carrot system consists of webservices that can easily be fed by a lucene resultlist. You simply have to create a JSP that creates this XML file and create a custom process and input component. The input component for lucene could look like: The c2.jsp file simply has to translate a resultlist into an XLM file such as: ... 1.0 http://... sum 1 snip 2 ... 1.0 http://... sum 2 snip 2 Feed this into the carrot system, and you will get a nice clustered result list. The amazing part is of this clustering mechanism is that the cluster labels are incredible, their great! Then there is a open source project called Classifier4J that can be used for classification, the oposite of clustering. These other open source projects are a great addition to the Lucene system. I hope this helps... Marc, what are you building?? Maybe we can help! Kind regards, Maurits ----- Original Message ----- From: "marc" To: "Lucene Users List" Sent: Tuesday, November 11, 2003 5:15 PM Subject: Document Clustering Hi, does anyone have any sample code/documentation available for doing document based clustering using lucene? Thanks, Marc --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-user-help@jakarta.apache.org