lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From maurits van wijland <>
Subject Re: Document Clustering
Date Tue, 11 Nov 2003 20:32:25 GMT
Hi All and Marc,

There is the carrot project :

The carrot system consists of webservices that can easily be fed by a lucene
resultlist. You simply have to create a JSP that creates this XML file and
create a custom process and input component. The input component
for lucene could look like:

<?xml version="1.0" encoding="UTF-8"?>
<service xmlns      =
"" framework  =
    <component id               = "carrot2.input.lucene"
               type             = "input"
               serviceURL       = "http://localhost/weblucene/c2.jsp"
               infoURL          = "http://localhost/weblucene/"

The c2.jsp file simply has to translate a resultlist into an XLM file such
    <document id="1">
 <summary>sum 1</summary>
 <snippet>snip 2</snippet>
    <document id="2">
 <summary>sum 2</summary>
 <snippet>snip 2</snippet>

Feed this into the carrot system, and you will get a nice clustered
result list. The amazing part is of this clustering mechanism is that
the cluster labels are incredible, their great!

Then there is a open source project called Classifier4J that can
be used for classification, the oposite of clustering. These other
open source projects are a great addition to the Lucene system.

I hope this helps...

Marc, what are you building?? Maybe we can help!

Kind regards,


----- Original Message ----- 
From: "marc" <>
To: "Lucene Users List" <>
Sent: Tuesday, November 11, 2003 5:15 PM
Subject: Document Clustering


does anyone have any sample code/documentation available for doing document
based clustering using lucene?


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message