Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 38710 invoked from network); 15 Aug 2007 15:11:17 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 15 Aug 2007 15:11:17 -0000 Received: (qmail 23254 invoked by uid 500); 15 Aug 2007 15:11:12 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 23199 invoked by uid 500); 15 Aug 2007 15:11:12 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 23188 invoked by uid 99); 15 Aug 2007 15:11:12 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 15 Aug 2007 08:11:12 -0700 X-ASF-Spam-Status: No, hits=0.2 required=10.0 tests=SPF_PASS,WHOIS_MYPRIVREG X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of grant.ingersoll@gmail.com designates 64.233.184.231 as permitted sender) Received: from [64.233.184.231] (HELO wr-out-0506.google.com) (64.233.184.231) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 15 Aug 2007 15:11:06 +0000 Received: by wr-out-0506.google.com with SMTP id 41so488574wry for ; Wed, 15 Aug 2007 08:10:26 -0700 (PDT) DKIM-Signature: a=rsa-sha1; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:mime-version:in-reply-to:references:content-type:message-id:content-transfer-encoding:from:subject:date:to:x-mailer; b=LkPRU0XWWMApuqDBVhNzWluaRitEhk5x0IKisPfqBd2kgtQ5WBVbjH6GZJsPRUvyxgRw8AkOSHpZtxhKsbbIavtkdMdgVCDh9O0U9w0Mk12wh9PwB1vHCS+eb2GDEVm9U942PznOZIjiz+fajVdbIXhpI3kMnbLKC5GEyVVrYMs= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:mime-version:in-reply-to:references:content-type:message-id:content-transfer-encoding:from:subject:date:to:x-mailer; b=rGaVRbtBI+J9InktCmiCxFyWKP5FYjRQjqM8X3gjImWX72mi+pum0xdgaFaxcvd1VMaO/V6jz2enSV7/slswilxgRnCjiC/X+hBcJh7FppsYyFn/ZE1g3Qy7Q6yg8y5KmtppihgIOctwAun86tAQb8dFUW3TwI7ePQBisEdCWQE= Received: by 10.90.83.14 with SMTP id g14mr841930agb.1187190626236; Wed, 15 Aug 2007 08:10:26 -0700 (PDT) Received: from ?192.168.0.3? ( [74.229.189.244]) by mx.google.com with ESMTPS id 36sm14438815agc.2007.08.15.08.10.24 (version=TLSv1/SSLv3 cipher=OTHER); Wed, 15 Aug 2007 08:10:25 -0700 (PDT) Mime-Version: 1.0 (Apple Message framework v752.3) In-Reply-To: References: <86111134-3930-4393-B49B-32557A19F151@apache.org> <829C71B5-BA1A-413F-BB28-E382BBF8A30D@gmail.com> <8636616C-8567-499E-BC71-6BE07E868129@apache.org> <665A433D-F21F-4F74-B495-7008DFE1B432@apache.org> Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: Content-Transfer-Encoding: 7bit From: Grant Ingersoll Subject: Re: Best Practices for getting Strings from a position range Date: Wed, 15 Aug 2007 11:10:06 -0400 To: java-dev@lucene.apache.org X-Mailer: Apple Mail (2.752.3) X-Virus-Checked: Checked by ClamAV on apache.org On Aug 15, 2007, at 10:46 AM, Peter Keegan wrote: > Grant, > > I built an index as described here: > http://www.nabble.com/SpanQuery-and-database-join-tf4262902.html > > Many documents have only 1 or 2 rows, some have dozens. > Here is a typical query without spans: > > +((+contents:quaker +contents:cereal) (+boost50:quaker > +boost50:cereal)) > +literals:co$us), sort= RoundingScoreDocComparator@8c169d05>,"dateactiveR"! > > > Here is a typical query with spans: > > +spanNear([adliterals:jb$1, adliterals:co$us], 8, false) > +(+((+contents:quaker +contents:cereal) (+boost50:quaker > +boost50:cereal)) > +literals:co$us), sort= RoundingScoreDocComparator@8c169d05>,"dateactiveR"! > > The addition of the spanNear clause caused the 10X decrease in > throughput. I > could probably change the way rows are indexed and use ordered > terms, which > seems to be a bit faster (only 5X decrease) In looking at the code, it makes sense that an ordered SpanNearQuery would be faster. I am still trying to dig into the logistics of the Unordered SpanNearQuery, as it is the only thing hanging me up on adding payload access to Spans. I need to step through and debug. As your stack trace showed, there is a lot of work taking place to manage the priority queue that is created. I just don't understand the relation between the SpanCells, the "ordered" List and the PriorityQueue "queue" just yet. It seems the SpanCells make a linked list, the "ordered" list is for getting the spans from the sub queries and the queue seems to rearrange the ordered list If anyone wants to chip in with pseudocode explaining what is going on in NearSpansUnordered.java it would be helpful. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org