Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 37297 invoked from network); 17 Dec 2008 03:20:55 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 17 Dec 2008 03:20:55 -0000 Received: (qmail 29998 invoked by uid 500); 17 Dec 2008 03:21:02 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 29919 invoked by uid 500); 17 Dec 2008 03:21:02 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 29910 invoked by uid 99); 17 Dec 2008 03:21:02 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 16 Dec 2008 19:21:02 -0800 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [203.89.202.182] (HELO postoffice2.aconex.com) (203.89.202.182) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 17 Dec 2008 03:20:46 +0000 X-ASG-Debug-ID: 1229484022-738f00170000-x8RgZh X-Barracuda-URL: http://postoffice2.aconex.com:8000/cgi-bin/mark.cgi Received: from postoffice.aconex.com (localhost [127.0.0.1]) by postoffice2.aconex.com (Spam Firewall) with ESMTP id E0A4A512EC9 for ; Wed, 17 Dec 2008 14:20:22 +1100 (EST) Received: from postoffice.aconex.com (postoffice.yarra.acx [192.168.102.1]) by postoffice2.aconex.com with ESMTP id rXJeDkxdRiJLgVIW for ; Wed, 17 Dec 2008 14:20:22 +1100 (EST) Received: from [10.1.1.155] (123-243-122-218.static.tpgi.com.au [123.243.122.218]) by postoffice.aconex.com (Postfix) with ESMTP id 7D1C992C2E8 for ; Wed, 17 Dec 2008 14:20:22 +1100 (EST) Message-ID: <49486FF7.3050704@aconex.com> Date: Wed, 17 Dec 2008 14:20:23 +1100 From: Paul Cowan User-Agent: Thunderbird 2.0.0.18 (X11/20081125) MIME-Version: 1.0 To: java-dev@lucene.apache.org X-ASG-Orig-Subj: Re: Searching in same position across multiple fields Subject: Re: Searching in same position across multiple fields References: <49471D0C.9020901@aconex.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Barracuda-Connect: postoffice.yarra.acx[192.168.102.1] X-Barracuda-Start-Time: 1229484022 X-Barracuda-Virus-Scanned: by Aconex Staff Email Spam Firewall at aconex.com X-Virus-Checked: Checked by ClamAV on apache.org Hi Hoss, Thanks for the reply. I've created a JIRA issue to track this -- https://issues.apache.org/jira/browse/LUCENE-1494 > the initial thought was that just removing the > term1.field=term2.field assertion would allow something liek this to work, > but i don't think anyone every tried creating a patch w/tests to verify > it. > > I think it would be a great idea. Great. I've implemented this in the first patch attached to the JIRA issue, including a test case. Rather than removing the assertion, I've brought in a specialized (very lightweight) subclass of SpanNearQuery -- I think the Javadoc should make it clear why (supporting multiple fields does screw with the semantics a little). > couldn't this be solved by an Analyzer that counts the token per fieldname > and implements getPositionIncrementGap as.. > > int result - SOME_BIG_NUM - tokensSeenMap.get(fieldname); > tokensSeenMap.put(fieldname, 0); > return result; It could, and we could always fall back to this. I've taken my approach and put that, also, as a patch against LUCENE-1494. If you're not happy with the implementation (it's quite lightweight, and shouldn't break Analyzer implementors) then we can do this in our analyzer, as you suggest above. The question is, though (I can't find any Javadoc etc. on this) -- is there an implicit assumption that, once set up, Analyzers are (or should be) thread-safe? Your suggestion would be hard to do in a threadsafe fashion without ThreadLocal maps or some such fun. Most Analyzers seem to be 'semi-threadsafe' or better -- i.e. Analyzer itself uses a ThreadLocal for the tokenStreams, KeywordAnalyzer keeps no state, StandardAnalyzer has state but it's once set up it stays static (though there are no publication guarantees around it... hmm), etc. Bringing that level of state into an Analyzer seems risky. Anyway, please do check out the JIRA issue and let me know what you think. I think both issues are addressed relatively cleanly. Cheers, Paul --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org