Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 71866 invoked from network); 22 Sep 2007 12:29:08 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 22 Sep 2007 12:29:07 -0000 Received: (qmail 36044 invoked by uid 500); 22 Sep 2007 12:28:52 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 36024 invoked by uid 500); 22 Sep 2007 12:28:52 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 36013 invoked by uid 99); 22 Sep 2007 12:28:52 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 22 Sep 2007 05:28:52 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of grant.ingersoll@gmail.com designates 209.85.132.245 as permitted sender) Received: from [209.85.132.245] (HELO an-out-0708.google.com) (209.85.132.245) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 22 Sep 2007 12:28:49 +0000 Received: by an-out-0708.google.com with SMTP id c5so163859anc for ; Sat, 22 Sep 2007 05:28:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:mime-version:in-reply-to:references:content-type:message-id:content-transfer-encoding:from:subject:date:to:x-mailer; bh=sZX8fw0wPoK9AOf+GpINgjySgrvutW34A9BeZSRjfY0=; b=a1M26L7WiN2MYKyqrxtgse05nQhmD+P8nZ59aUkvgMYUk76z3S5Wfqkr2yx8MZKctOv5W8kgV6UP7rHsgm2loW/HSeUk+EnuBU+kKI1r3G2Co8nRTkjkPqSMGxFTiFogXELPGpMFg+p3ZE+C7rF5RJtq7mLOFmVBpGGwYJXEG5w= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:mime-version:in-reply-to:references:content-type:message-id:content-transfer-encoding:from:subject:date:to:x-mailer; b=j/2xUUN4dDFwYIeHAYyfBMOyGVTa6NgoSuzqBx/ui/RdVOOqc7NgzQavaP2mMQWKftIw6hgJy7ZYZRfTvpg93aYvZD0PrqZ1XjvhpyZRcaZsBpsAeGsU8M8GzseqJfHD/E2NWdPp7UJvLhdR9pEgMDLWDlBrPOdPg+NRr4+EaqY= Received: by 10.100.42.7 with SMTP id p7mr7865458anp.1190464107911; Sat, 22 Sep 2007 05:28:27 -0700 (PDT) Received: from ?192.168.0.3? ( [74.229.189.244]) by mx.google.com with ESMTPS id l43sm1966826wrl.2007.09.22.05.28.26 (version=TLSv1/SSLv3 cipher=OTHER); Sat, 22 Sep 2007 05:28:27 -0700 (PDT) Mime-Version: 1.0 (Apple Message framework v752.3) In-Reply-To: <12835063.post@talk.nabble.com> References: <12835063.post@talk.nabble.com> Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: <5D233471-A4CD-4DBB-977B-C6A2FA7721E7@gmail.com> Content-Transfer-Encoding: 7bit From: Grant Ingersoll Subject: Re: Span queries, API and difficulties Date: Sat, 22 Sep 2007 08:28:21 -0400 To: java-dev@lucene.apache.org X-Mailer: Apple Mail (2.752.3) X-Virus-Checked: Checked by ClamAV on apache.org Hi Cedric, Thanks for the detailed response. My suggestion would be to write up a set of patches that demonstrate what you want for the SpanQuery stuff, and the BooleanQuery stuff, preferably as separate patches. The SpanQuery stuff makes the most sense to me and since I am slowly, but surely, working on it, I could try to incorporate it. As for the HitCollector, I am not exactly sure what you are trying to get at there. What Object is going to be passed in? Is it the Match object? What would it mean for other implementations that aren't using a Match object? How would it be incorporated into Lucene for a general case? Again, a patch here may make it obvious. -Grant On Sep 22, 2007, at 5:45 AM, melix wrote: > > Hi all, > > Sorry for the late response, I've been quite busy (working on my > Lucene > tweak, and still not finished ;)). Basically, I need to be able to > find out > what matched on a document basis on a complex query. For example, > in a OR > clause, I need to know which of the sub(s) clause(s) have matched, > and, > going deeper in the query tree, for each subclause itself, find out > what > matched. This is made to be able to score documents with semantics > reasoning. > > As I want to limit breaking Lucene compatibility, I've decided to > try, as > most as possible, to subclass Lucene classes. This is where it > starts to be > difficult. So I've subclassed (most of) span queries classes so > that the > getSpans() method returns my own span interface : > > public interface IExtendedSpans extends Spans,IMatcher { > } > > public interface IMatcher { > Match match(); > } > > The reason why I have a separate IMatcher interface is that span > queries are > not the only queries which may "return" matches. We'll see this > later. So I > implemented my own SpanNearQuery, which inherits the Lucene SNQ, so > that > when a span is found, I can return the corresponding match. A match > is a > collection of submatches, and I've decided to subclass the Match > class for > each query type (this makes algorithms more readable, and easier to > write). > For a span near query, the match() method will basically return a > SpanNearMatch, and so on. > > Problem : the Lucene span queries members are private -not > protected-, so > subclasses cannot use them. For example, my subclass needs access > to the > clauses, and I have to use the getter while I could directly use > the member > (performance implication). Next, the spans subclasses are private > static > classes, and I have to rewrite them to return *my* spans. So in this > particular point, this is really annoying because I have to copy > the exact > inner classes (if not anonymous...) just to add my match() method. > This is > annoying because by doing this, I'm potentially breaking > compatibility with > future releases of Lucene. > > The problem was even harder when I had to add the match() method to > the > BooleanQuery : this class is so complex, and uses so many protected > or inner > classes (for optimization purposes, I must understand) that I would > have to > copy a lot of the original source code just to add my method. As > documentation on how it works is really hard to find, I decided it > would be > simpler if I wrote my own boolean queries (which is what I've done > now). I > know it must be much less performant, but makes the tasks much easier. > > By the way, it would really be glad if the you could extract an > interface > from the Query class. As all my queries implement an interface (to > be sure > that you don't mix queries which support the match feature with > ones that > don't), it would avoid many casts (the other solution would be that I > extract the interface myself and make my IMatchAwareQuery interface > have > those methods, but I'm sure it would be cleaner if this was > directly in > Lucene). > > Last but not least, it would be glad if the HitCollector class had a > collect() method with an Object parameter : the scoring I'm using > cannot > just work on a collection of floats. It requires the matches, so > I'm passing > a DocMatchesHolder instance to my HitCollector so that it can work > on it. > This leads to the following (and not really clean) code recopied in > my top > level Scorer implementations : > > public void score(HitCollector aHitCollector) throws IOException { > if (aHitCollector instanceof SearchingContext) { > SearchingContext ctx = (SearchingContext) aHitCollector; > while (next()) { > final DocMatchesHolder doc = docMatches(); > final float score = score(); > ctx.addHit(doc, score); > ctx.collect(doc(), score); > } > } else super.score(aHitCollector); > } > > Thanks for reading ;) > > Cedric > -- > View this message in context: http://www.nabble.com/Span-queries%2C- > API-and-difficulties-tf4500460.html#a12835063 > Sent from the Lucene - Java Developer mailing list archive at > Nabble.com. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-dev-help@lucene.apache.org > ------------------------------------------------------ Grant Ingersoll http://www.grantingersoll.com/ http://lucene.grantingersoll.com http://www.paperoftheweek.com/ --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org