Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 35758 invoked from network); 16 Apr 2006 23:04:10 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 16 Apr 2006 23:04:10 -0000 Received: (qmail 5497 invoked by uid 500); 16 Apr 2006 23:03:55 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 5465 invoked by uid 500); 16 Apr 2006 23:03:54 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 5454 invoked by uid 99); 16 Apr 2006 23:03:54 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 16 Apr 2006 16:03:54 -0700 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: local policy) Received: from [194.109.24.27] (HELO smtp-vbr7.xs4all.nl) (194.109.24.27) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 16 Apr 2006 16:03:54 -0700 Received: from k8l.lan (porta.xs4all.nl [80.127.24.69]) by smtp-vbr7.xs4all.nl (8.13.6/8.13.6) with ESMTP id k3GN3WIP074449 for ; Mon, 17 Apr 2006 01:03:32 +0200 (CEST) (envelope-from paul.elschot@xs4all.nl) From: Paul Elschot To: java-user@lucene.apache.org Subject: Re: Using Lucene for searching tokens, not storing them. Date: Mon, 17 Apr 2006 01:03:30 +0200 User-Agent: KMail/1.8.2 References: <53423917-9808-4EA8-996A-95F7CD6218FD@snigel.net> <200604152132.18437.paul.elschot@xs4all.nl> <230F88BC-DC49-4128-9303-DE2FE4E2FED8@snigel.net> In-Reply-To: <230F88BC-DC49-4128-9303-DE2FE4E2FED8@snigel.net> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200604170103.31441.paul.elschot@xs4all.nl> X-Virus-Scanned: by XS4ALL Virus Scanner X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N On Sunday 16 April 2006 19:18, karl wettin wrote: > > 15 apr 2006 kl. 21.32 skrev Paul Elschot: > >> > >> implements TermPositions { > >> public int nextPosition() throws IOException { > > > > This enumerates all positions of the Term in the document > > as returned by the Tokenizer used by the Analyzer > > Aha. And I didn't see the TermPositionVector until now. > > This leads me to a new question. How is multiple fields with the same > name treated? Are the positions concated or in a "z-axis"? I see > SpanQuery-troubles with both. > > Concated renders SpanFirst unusable on fields n > 0 > [hello,0] [world,1] [foo,2] [bar,3] > > "Z-axis" mess up SpanNear, as "hello bar" is correct. > [hello,0] [world,1] > [foo,0] [bar,1] > > Hmm.. (with double semantics, as this would mean I can't use the term > positions to train my hidden markov models). Sorry, no new dimension. The token position just increases at each new field with the same name. But multiple stored fields with the same field name can be retrieved iirc. It is possible to index a larger position gap between two such fields to avoid query distance matching over the gaps. Extra dimensions can be had by indexing term tags (as terms) at the same positions as their corresponding terms. Regards, Paul Elschot --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org