Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@www.apache.org Received: (qmail 36704 invoked from network); 23 Sep 2004 17:01:58 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur-2.apache.org with SMTP; 23 Sep 2004 17:01:58 -0000 Received: (qmail 37320 invoked by uid 500); 23 Sep 2004 17:04:01 -0000 Delivered-To: apmail-jakarta-lucene-user-archive@jakarta.apache.org Received: (qmail 37263 invoked by uid 500); 23 Sep 2004 17:04:01 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 37244 invoked by uid 99); 23 Sep 2004 17:04:00 -0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: pass (hermes.apache.org: local policy) Received: from [69.44.16.11] (HELO getopt.org) (69.44.16.11) by apache.org (qpsmtpd/0.28) with ESMTP; Thu, 23 Sep 2004 10:03:58 -0700 Received: from [192.168.101.238] ([213.80.34.50]) (authenticated) by getopt.org (8.11.6/8.11.6) with ESMTP id i8NH3wb03440 for ; Thu, 23 Sep 2004 12:03:58 -0500 Message-ID: <41530200.4040403@getopt.org> Date: Thu, 23 Sep 2004 19:04:00 +0200 From: Andrzej Bialecki User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.2) Gecko/20040803 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Lucene Users List Subject: Re: Clustering lucene's results References: <4152D0EF.1050404@cs.put.poznan.pl> In-Reply-To: <4152D0EF.1050404@cs.put.poznan.pl> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N Dawid Weiss wrote: > Hi William, > > No, I don't have examples because I never used Lucene directly. If you > provide me with a sample index and an API that executes a query on this > index (I need document titles, summaries, or snippets and an anchor > (identifier), can be an URL). Hi Dawid :-) I believe the approach to this component should be that you first initialize it by reading a mapping of Lucene index field names to "logical" names (metadata) like title, url, body, etc. The reason is that each index uses its own metadata schema, i.e. in Lucene-speak, the field names. Moreover, when you execute a query you get just a document id plus its score. It's up to you to build a snippet. There is a code in the jakarta-lucene-sandbox CVS repo. (highlighter) to create snippets from the query and the hit list, take a look at this... > > Send me such a snippet and I'll try to write the integration code with > Lucene. It is only a matter of writing a simple InputComponent instance > and this is really trivial (see Nutch's plugin code). The basic usage scenario is that you open the IndexReader (either using directory name as a String or a Directory instance), and then create a Query instance, usually using QueryParser, and finally you search using IndexSearcher. You get a list of Hits, which you can use to get scores, and the contents of the documents. Take a look at the IndexFiles and SearchFiles classes in org.apache.lucene.demo package (under /src/demo). -- Best regards, Andrzej Bialecki ------------------------------------------------- Software Architect, System Integration Specialist CEN/ISSS EC Workshop, ECIMF project chair EU FP6 E-Commerce Expert/Evaluator ------------------------------------------------- FreeBSD developer (http://www.freebsd.org) --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-user-help@jakarta.apache.org