lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From maurits van wijland <m.vanwijl...@quicknet.nl>
Subject Re: Document Clustering
Date Tue, 11 Nov 2003 20:32:25 GMT
Hi All and Marc,

There is the carrot project :
http://www.cs.put.poznan.pl/dweiss/carrot/

The carrot system consists of webservices that can easily be fed by a lucene
resultlist. You simply have to create a JSP that creates this XML file and
create a custom process and input component. The input component
for lucene could look like:

<?xml version="1.0" encoding="UTF-8"?>
<service xmlns      =
"http://www.dawidweiss.com/projects/carrot/componentDescriptor" framework  =
"Carrot2">
    <component id               = "carrot2.input.lucene"
               type             = "input"
               serviceURL       = "http://localhost/weblucene/c2.jsp"
               infoURL          = "http://localhost/weblucene/"
    />
</service>

The c2.jsp file simply has to translate a resultlist into an XLM file such
as:
<searchresult>
    <document id="1">
 <title>...</title>
 <weight>1.0</weight>
 <url>http://...</url>
 <summary>sum 1</summary>
 <snippet>snip 2</snippet>
    </document>
    <document id="2">
 <title>...</title>
 <weight>1.0</weight>
 <url>http://...</url>
 <summary>sum 2</summary>
 <snippet>snip 2</snippet>
    </document>
</searchresult>

Feed this into the carrot system, and you will get a nice clustered
result list. The amazing part is of this clustering mechanism is that
the cluster labels are incredible, their great!

Then there is a open source project called Classifier4J that can
be used for classification, the oposite of clustering. These other
open source projects are a great addition to the Lucene system.

I hope this helps...

Marc, what are you building?? Maybe we can help!

Kind regards,

Maurits


----- Original Message ----- 
From: "marc" <marc@bioseeker.bioinfocg.com>
To: "Lucene Users List" <lucene-user@jakarta.apache.org>
Sent: Tuesday, November 11, 2003 5:15 PM
Subject: Document Clustering


Hi,

does anyone have any sample code/documentation available for doing document
based clustering using lucene?

Thanks,
Marc



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message