Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@www.apache.org Received: (qmail 13489 invoked from network); 7 Oct 2004 14:04:07 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur-2.apache.org with SMTP; 7 Oct 2004 14:04:07 -0000 Received: (qmail 83091 invoked by uid 500); 7 Oct 2004 14:03:21 -0000 Delivered-To: apmail-jakarta-lucene-user-archive@jakarta.apache.org Received: (qmail 82987 invoked by uid 500); 7 Oct 2004 14:03:19 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 82855 invoked by uid 99); 7 Oct 2004 14:03:18 -0000 X-ASF-Spam-Status: No, hits=1.5 required=10.0 tests=DNS_FROM_RFC_POST X-Spam-Check-By: apache.org Received-SPF: pass (hermes.apache.org: local policy) Received: from [64.4.27.44] (HELO hotmail.com) (64.4.27.44) by apache.org (qpsmtpd/0.28) with ESMTP; Thu, 07 Oct 2004 07:03:13 -0700 Received: from mail pickup service by hotmail.com with Microsoft SMTPSVC; Thu, 7 Oct 2004 07:03:10 -0700 Received: from 200.214.14.1 by by8fd.bay8.hotmail.msn.com with HTTP; Thu, 07 Oct 2004 14:02:56 GMT X-Originating-IP: [200.214.14.1] X-Originating-Email: [william_wws@hotmail.com] X-Sender: william_wws@hotmail.com From: "William W" To: lucene-user@jakarta.apache.org Bcc: Subject: Re: Clustering lucene's results Date: Thu, 07 Oct 2004 14:02:56 +0000 Mime-Version: 1.0 Content-Type: text/plain; format=flowed Message-ID: X-OriginalArrivalTime: 07 Oct 2004 14:03:10.0808 (UTC) FILETIME=[6114AD80:01C4AC76] X-Virus-Checked: Checked X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N Thanks Dawid !!!!! :) >From: Dawid Weiss >Reply-To: "Lucene Users List" >To: Lucene Users List >Subject: Re: Clustering lucene's results >Date: Thu, 07 Oct 2004 10:39:26 +0200 > > >Hi William, > >Ok, here is some demo code I've put together that shows how you can achieve >clustering of Lucene's results. I hope this will get you started on your >projects. If you have questions, please don't hesitate to ask -- cross >posts to carrot2-developers would be a good idea too. > >The code (plus the binaries so that you don't have to check out all of >Carrot2 ;) are at: >http://www.cs.put.poznan.pl/dweiss/tmp/carrot2-lucene.zip > >Take a look at Demo.java -- it is the main link between Lucene and Carrot. >Play with the parameters, I used 100 as the number of search results to be >clustered. Adjust it to your needs. > > int start = 0; > int requiredHits = 100; > >I hope the code will be self-explanatory. > >Good luck, >Dawid > >>From the readme file: > >An example of using Carrot2 components to clustering search >results from Lucene. >=========================================================== > > >Prerequisities >-------------- > >You must have an index created with Lucene and containing >documents with the following fields: url, title, summary. > >The Lucene demo works with exactly these fields -- I just indexed >all of Lucene's source code and documentation using the following line: > >mkdir index >java -Djava.ext.dirs=build org.apache.lucene.demo.IndexHTML -create -index >index . > >The index is now in 'index' folder. > >Remember that the quality of snippets and titles heavily influences the >output of the clustering; in fact, the above example index of Lucene's API >is >not too good because most queries will return nonsensical cluster labels >(see below). > >Building Carrot2-Lucene demo >---------------------------- > >Basically you should have all of Carrot2 source code checked out and >issue the building command: > >ant -Dcopy.dependencies=true > >All of the required libraries and Carrot2 components will end up >in 'tmp/dist/deps-carrot2-lucene-example-jar' folder. > >You can also spare yourself some time and download precompiled binaries >I've put at: > >http://www.cs.put.poznan.pl/dweiss/tmp/carrot2-lucene.zip > >Now, once you have the compiled binaries, issue the following command >(all on one line of course): > >java -Djava.ext.dirs=tmp\dist;tmp\dist\deps-carrot2-lucene-example-jar \ > com.dawidweiss.carrot.lucene.Demo index query > >The first argument is the location of the Lucene's index created before. >The second argument >is a query. In the output you should have clusters and max. three documents >from every cluster: > >Results for: query >Timings: index opened in: 0,181s, search: 0,13s, clustering: 0,721s > :> Search Lucene Rc1 Dev API > - >F:/Repositories/cvs.apache.org/jakarta-lucene/build/docs/api/org/apache/lucene/search/class-use/Query.html > Uses of Class org.apache.lucene.search.Query (Lucene 1.5-rc1-dev >API) > - >F:/Repositories/cvs.apache.org/jakarta-lucene/build/docs/api/org/apache/lucene/search/package-summary.html > org.apache.lucene.search (Lucene 1.5-rc1-dev API) > - >F:/Repositories/cvs.apache.org/jakarta-lucene/build/docs/api/org/apache/lucene/search/package-use.html > Uses of Package org.apache.lucene.search (Lucene 1.5-rc1-dev API) > (and 19 more) > > :> Jakarta Lucene > - F:/Repositories/cvs.apache.org/jakarta-lucene/src/java/overview.html > Jakarta Lucene API > - F:/Repositories/cvs.apache.org/jakarta-lucene/docs/whoweare.html > Jakarta Lucene - Who We Are - Jakarta Lucene > - F:/Repositories/cvs.apache.org/jakarta-lucene/docs/index.html > Jakarta Lucene - Overview - Jakarta Lucene > (and 12 more) > >If you look at the source code of Demo.java, there are plenty of things >apt for customization -- number of results from each cluster, number of >displayed >clusters (I would cut it to some reasonable number, say 10 or 15 -- the >further a >cluster is from the "top", the less it is likely to be important). Also >keep >in mind that some of Carrot2 components produce hierarchical clusters. This >demonstration >works with "flat" version of Lingo algorithm, so you don't need to worry >about it. > >Hope this gets you started with using Carrot2 and Lucene. >Please let me know about any successes or failures. > >Dawid > >--------------------------------------------------------------------- >To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org >For additional commands, e-mail: lucene-user-help@jakarta.apache.org > _________________________________________________________________ Check out Election 2004 for up-to-date election news, plus voter tools and more! http://special.msn.com/msn/election2004.armx --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-user-help@jakarta.apache.org