From java-user-return-44965-apmail-lucene-java-user-archive=lucene.apache.org@lucene.apache.org Wed Feb 10 12:46:13 2010 Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 50956 invoked from network); 10 Feb 2010 12:46:13 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 10 Feb 2010 12:46:13 -0000 Received: (qmail 80935 invoked by uid 500); 10 Feb 2010 12:46:10 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 80859 invoked by uid 500); 10 Feb 2010 12:46:10 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 80849 invoked by uid 99); 10 Feb 2010 12:46:10 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 10 Feb 2010 12:46:10 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [140.203.201.101] (HELO mx2.nuigalway.ie) (140.203.201.101) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 10 Feb 2010 12:46:02 +0000 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AvsEAJM5cksKhJ0L/2dsb2JhbACaXnS7C4RVBA X-IronPort-AV: E=Sophos;i="4.49,443,1262563200"; d="scan'208";a="10163884" Received: from unknown (HELO EVS1.ac.nuigalway.ie) ([10.132.157.11]) by mx2.nuigalway.ie with ESMTP; 10 Feb 2010 12:45:19 +0000 Received: from EVS1.ac.nuigalway.ie ([10.132.157.14]) by EVS1.ac.nuigalway.ie with Microsoft SMTPSVC(6.0.3790.3959); Wed, 10 Feb 2010 12:45:19 +0000 Received: from [10.196.2.166] ([140.203.154.5]) by EVS1.ac.nuigalway.ie over TLS secured channel with Microsoft SMTPSVC(6.0.3790.3959); Wed, 10 Feb 2010 12:45:18 +0000 Message-ID: <4B72AA4F.70104@deri.org> Date: Wed, 10 Feb 2010 12:45:03 +0000 From: Renaud Delbru User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.9pre) Gecko/20100205 Lightning/1.0pre Shredder/3.0.2pre MIME-Version: 1.0 To: java-user@lucene.apache.org Subject: Re: Flex & Docs/AndPositionsEnum References: <4B714F5C.2040709@deri.org> <002701caa981$b5400ef0$1fc02cd0$@de> <4B71588D.3060001@deri.org> <002801caa988$7a5c5bc0$6f151340$@de> <9ac0c6aa1002090535s2757d07fj2e17bfc244aebf5b@mail.gmail.com> <4B716C5F.6020907@deri.org> <9ac0c6aa1002090804s19c3d6e6qc1a34bdaca5dd982@mail.gmail.com> <4B718EEB.3070605@deri.org> <9ac0c6aa1002090851v1183e96fybbcd61f1a277b00a@mail.gmail.com> <20100209181235.GA15349@rectangular.com> <9ac0c6aa1002091247j36033b9bt65c787d8703d371e@mail.gmail.com> In-Reply-To: <9ac0c6aa1002091247j36033b9bt65c787d8703d371e@mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-OriginalArrivalTime: 10 Feb 2010 12:45:18.0896 (UTC) FILETIME=[E6BF7B00:01CAAA4E] Hi Michael, On 09/02/10 20:47, Michael McCandless wrote: > But, then, it's very convenient when you need it and don't care about > performance. EG in Renaud's usage, a test case that is trying to > assert that all indexed docs look right, why should you be forced to > operate per segment? He shouldn't have to bother with the details of > which field/term/doc was indexed into which segment. > > Or, I guess we could argue that this test really should create a > TermQuery and walk the matching docs... instead of using the low level > flex enum APIs. Because searching impl already knows how to step > through the segments. > In fact, I care about performance, but I was using the IndexReader.termPositionsEnum to mimic the implementation of the different query scorers (e.g., TermScorer). I have already reimplemented many of the original Lucene Scorers to use my particular index structure. From what I have seen, the main low level scorers (e.g., TermScorer, PhraseScorer) are using the DocsEnum interface, and not a segment-level enum. From what I understand, these scorers are not aware if they are using a segment-level enum or a Multi*Enum. So, there is a loss of performance in this case ? Or do I miss something ? I'll try to clarify my usage of the Flex API, maybe it can highlight you certain aspects. In the ideal world, what I would like to do is the following: 1) write my own codec, 2) register my codec in the IndexWriter, and tell him to use this codec for one or more fields (similar to the PerFieldCodecWrapper), 3) write query operators that are compatible with my codec, 4) at search time, use these query operators with the fields that use my codec. If by error, I am using the query operators which are not compatible with a field (and its related codec), an exception is thrown telling me that I am not able to use these query operators with this field. So, in my current use case, I don't think it is necessary to be aware of that fact that I am manipulating multiple segments or only one segment. I think this should be hidden. But what you were suggesting is to create my own "MultiReader" that is optimised for my codec. Is that right ? A MultiReader that just iterates over the subreaders, checks if they are using my codec (and therefore associated fields), and uses them to iterate over my own postings ? -- Renaud Delbru --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org