Return-Path: Delivered-To: apmail-lucene-dev-archive@www.apache.org Received: (qmail 26861 invoked from network); 3 Sep 2010 18:34:24 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 3 Sep 2010 18:34:24 -0000 Received: (qmail 28761 invoked by uid 500); 3 Sep 2010 18:34:23 -0000 Delivered-To: apmail-lucene-dev-archive@lucene.apache.org Received: (qmail 28660 invoked by uid 500); 3 Sep 2010 18:34:22 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 28653 invoked by uid 99); 3 Sep 2010 18:34:22 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 03 Sep 2010 18:34:22 +0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of yseeley@gmail.com designates 74.125.82.48 as permitted sender) Received: from [74.125.82.48] (HELO mail-ww0-f48.google.com) (74.125.82.48) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 03 Sep 2010 18:34:16 +0000 Received: by wwb39 with SMTP id 39so1364413wwb.5 for ; Fri, 03 Sep 2010 11:33:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:sender:reply-to:received :in-reply-to:references:date:x-google-sender-auth:message-id:subject :from:to:content-type:content-transfer-encoding; bh=n9V+eYelM/237zSCpj7ChcDAX5kMEA6NyHdU3PXV6TM=; b=CVMbnAHbN/upalNSxGTq5SMta0eIrLxJVfla5hgCEyDS/yUAQ2zgDHN8YI9DvvUqYA PKZHrWSIo6Nx4CjyezOnywbYDyp72QhntTBmoW52d/bmKHL5T9ys6jAK+qcK9J0xodT1 ca4e8KOmrlrDdMoBgGm6FhWh1YPJWWT7ZNshw= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:reply-to:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:content-type :content-transfer-encoding; b=nnt+SZ40GyBEnEU/4NcnaBcX2xOfvbenXAejpMztaY6evp4UhBZHQnpkI3rL9iemV8 DNAuT77gFGIMBO6L5lVWPw36VX+5S8ovmnZAH2CdfXkchA3Pvk6gaCi6wu0N0adiczjn ihW5Uky3OEGph6JzqsgtiipWfP0mYEtKa2ZU4= MIME-Version: 1.0 Received: by 10.216.93.9 with SMTP id k9mr767706wef.89.1283538834180; Fri, 03 Sep 2010 11:33:54 -0700 (PDT) Sender: yseeley@gmail.com Reply-To: yonik@lucidimagination.com Received: by 10.216.48.11 with HTTP; Fri, 3 Sep 2010 11:33:54 -0700 (PDT) In-Reply-To: References: <20100903171226.8E4C8238897D@eris.apache.org> Date: Fri, 3 Sep 2010 14:33:54 -0400 X-Google-Sender-Auth: -qXR0ktKjklVHGrH6PfJXSN_u44 Message-ID: Subject: Re: svn commit: r992382 - in /lucene/dev/trunk/solr: ./ src/java/org/apache/solr/request/ src/java/org/apache/solr/util/ src/test/org/apache/solr/util/ From: Yonik Seeley To: dev@lucene.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable On Fri, Sep 3, 2010 at 2:24 PM, Robert Muir wrote: > Does SimpleFacetsTest fail for you? > I svn up'ed and i see a few failures Ack - it does! I'll fix. I'm wondering how that happened... I think I may have previously done a ant test -Dtestcase=3DTestSimpleFacets (notice the misspelling) -Yonik http://lucenerevolution.org Lucene/Solr Conference, Boston Oct 7-8 > On Fri, Sep 3, 2010 at 1:12 PM, wrote: >> >> Author: yonik >> Date: Fri Sep =A03 17:12:26 2010 >> New Revision: 992382 >> >> URL: http://svn.apache.org/viewvc?rev=3D992382&view=3Drev >> Log: >> SOLR-2092: use native long PQ to order facet results >> >> Added: >> >> =A0lucene/dev/trunk/solr/src/java/org/apache/solr/util/LongPriorityQueue= .java >> =A0 (with props) >> Modified: >> =A0 =A0lucene/dev/trunk/solr/CHANGES.txt >> >> =A0lucene/dev/trunk/solr/src/java/org/apache/solr/request/SimpleFacets.j= ava >> >> =A0lucene/dev/trunk/solr/src/java/org/apache/solr/request/UnInvertedFiel= d.java >> =A0 =A0lucene/dev/trunk/solr/src/test/org/apache/solr/util/PrimUtilsTest= .java >> >> Modified: lucene/dev/trunk/solr/CHANGES.txt >> URL: >> http://svn.apache.org/viewvc/lucene/dev/trunk/solr/CHANGES.txt?rev=3D992= 382&r1=3D992381&r2=3D992382&view=3Ddiff >> >> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D >> --- lucene/dev/trunk/solr/CHANGES.txt (original) >> +++ lucene/dev/trunk/solr/CHANGES.txt Fri Sep =A03 17:12:26 2010 >> @@ -288,6 +288,12 @@ Optimizations >> =A0* SOLR-2046: Simplify legacy replication scripts by adding common >> functions >> =A0 to scripts-util. (koji) >> >> +* SOLR-2092: Speed up single-valued and multi-valued "fc" faceting. >> Typical >> + =A0improvement is 5%, but can be much greater (up to 10x faster) when >> facet.offset >> + =A0is very large (deep paging). (yonik) >> + >> + >> + >> =A0Bug Fixes >> =A0---------------------- >> >> >> Modified: >> lucene/dev/trunk/solr/src/java/org/apache/solr/request/SimpleFacets.java >> URL: >> http://svn.apache.org/viewvc/lucene/dev/trunk/solr/src/java/org/apache/s= olr/request/SimpleFacets.java?rev=3D992382&r1=3D992381&r2=3D992382&view=3Dd= iff >> >> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D >> --- >> lucene/dev/trunk/solr/src/java/org/apache/solr/request/SimpleFacets.java >> (original) >> +++ >> lucene/dev/trunk/solr/src/java/org/apache/solr/request/SimpleFacets.java= Fri >> Sep =A03 17:12:26 2010 >> @@ -44,6 +44,7 @@ import org.apache.solr.util.BoundedTreeS >> =A0import org.apache.solr.util.ByteUtils; >> =A0import org.apache.solr.util.DateMathParser; >> =A0import org.apache.solr.handler.component.ResponseBuilder; >> +import org.apache.solr.util.LongPriorityQueue; >> >> =A0import java.io.IOException; >> =A0import java.util.*; >> @@ -416,9 +417,9 @@ public class SimpleFacets { >> =A0 =A0 } >> >> =A0 =A0 final int nTerms=3DendTermIndex-startTermIndex; >> + =A0 =A0int missingCount =3D -1; >> >> =A0 =A0 CharArr spare =3D new CharArr(); >> - >> =A0 =A0 if (nTerms>0 && docs.size() >=3D mincount) { >> >> =A0 =A0 =A0 // count collection array only needs to be as big as the num= ber of >> terms we are >> @@ -475,6 +476,10 @@ public class SimpleFacets { >> =A0 =A0 =A0 =A0 } >> =A0 =A0 =A0 } >> >> + =A0 =A0 =A0if (startTermIndex =3D=3D 0) { >> + =A0 =A0 =A0 =A0missingCount =3D counts[0]; >> + =A0 =A0 =A0} >> + >> =A0 =A0 =A0 // IDEA: we could also maintain a count of "other"... everyt= hing >> that fell outside >> =A0 =A0 =A0 // of the top 'N' >> >> @@ -484,7 +489,8 @@ public class SimpleFacets { >> =A0 =A0 =A0 if (sort.equals(FacetParams.FACET_SORT_COUNT) || >> sort.equals(FacetParams.FACET_SORT_COUNT_LEGACY)) { >> =A0 =A0 =A0 =A0 int maxsize =3D limit>0 ? offset+limit : Integer.MAX_VAL= UE-1; >> =A0 =A0 =A0 =A0 maxsize =3D Math.min(maxsize, nTerms); >> - =A0 =A0 =A0 =A0final BoundedTreeSet> queue= =3D new >> BoundedTreeSet>(maxsize); >> + =A0 =A0 =A0 =A0LongPriorityQueue queue =3D new >> LongPriorityQueue(Math.min(maxsize,1000), maxsize, Long.MIN_VALUE); >> + >> =A0 =A0 =A0 =A0 int min=3Dmincount-1; =A0// the smallest value in the to= p 'N' values >> =A0 =A0 =A0 =A0 for (int i=3D(startTermIndex=3D=3D0)?1:0; i> =A0 =A0 =A0 =A0 =A0 int c =3D counts[i]; >> @@ -492,18 +498,33 @@ public class SimpleFacets { >> =A0 =A0 =A0 =A0 =A0 =A0 // NOTE: we use c>min rather than c>=3Dmin as an= optimization >> because we are going in >> =A0 =A0 =A0 =A0 =A0 =A0 // index order, so we already know that the keys= are ordered. >> =A0This can be very >> =A0 =A0 =A0 =A0 =A0 =A0 // important if a lot of the counts are repeated= (like zero >> counts would be). >> - =A0 =A0 =A0 =A0 =A0 =A0queue.add(new >> CountPair(si.lookup(startTermIndex+i, new BytesRef()), >> c)); >> - =A0 =A0 =A0 =A0 =A0 =A0if (queue.size()>=3Dmaxsize) min=3Dqueue.last()= .val; >> + >> + =A0 =A0 =A0 =A0 =A0 =A0// smaller term numbers sort higher, so subtrac= t the term >> number instead >> + =A0 =A0 =A0 =A0 =A0 =A0long pair =3D (((long)c)<<32) + (Integer.MAX_VA= LUE - i); >> + =A0 =A0 =A0 =A0 =A0 =A0boolean displaced =3D queue.insert(pair); >> + =A0 =A0 =A0 =A0 =A0 =A0if (displaced) min=3D(int)(queue.top() >>> 32); >> =A0 =A0 =A0 =A0 =A0 } >> =A0 =A0 =A0 =A0 } >> - =A0 =A0 =A0 =A0// now select the right page from the results >> - =A0 =A0 =A0 =A0for (CountPair p : queue) { >> - =A0 =A0 =A0 =A0 =A0if (--off>=3D0) continue; >> - =A0 =A0 =A0 =A0 =A0if (--lim<0) break; >> + >> + =A0 =A0 =A0 =A0// if we are deep paging, we don't have to order the hi= ghest >> "offset" counts. >> + =A0 =A0 =A0 =A0int collectCount =3D Math.max(0, queue.size() - off); >> + =A0 =A0 =A0 =A0assert collectCount < lim; >> + >> + =A0 =A0 =A0 =A0// the start and end indexes of our list "sorted" (star= ting with >> the highest value) >> + =A0 =A0 =A0 =A0int sortedIdxStart =3D queue.size() - (collectCount - 1= ); >> + =A0 =A0 =A0 =A0int sortedIdxEnd =3D queue.size() + 1; >> + =A0 =A0 =A0 =A0final long[] sorted =3D queue.sort(collectCount); >> + >> + =A0 =A0 =A0 =A0for (int i=3DsortedIdxStart; i> + =A0 =A0 =A0 =A0 =A0long pair =3D sorted[i]; >> + =A0 =A0 =A0 =A0 =A0int c =3D (int)(pair >>> 32); >> + =A0 =A0 =A0 =A0 =A0int tnum =3D Integer.MAX_VALUE - (int)pair; >> + >> =A0 =A0 =A0 =A0 =A0 spare.reset(); >> - =A0 =A0 =A0 =A0 =A0ft.indexedToReadable(p.key, spare); >> - =A0 =A0 =A0 =A0 =A0res.add(spare.toString(), p.val); >> + =A0 =A0 =A0 =A0 =A0ft.indexedToReadable(si.lookup(startTermIndex+tnum,= br), >> spare); >> + =A0 =A0 =A0 =A0 =A0res.add(spare.toString(), c); >> =A0 =A0 =A0 =A0 } >> + >> =A0 =A0 =A0 } else { >> =A0 =A0 =A0 =A0 // add results in index order >> =A0 =A0 =A0 =A0 int i=3D(startTermIndex=3D=3D0)?1:0; >> @@ -526,7 +547,10 @@ public class SimpleFacets { >> =A0 =A0 } >> >> =A0 =A0 if (missing) { >> - =A0 =A0 =A0res.add(null, getFieldMissingCount(searcher,docs,fieldName)= ); >> + =A0 =A0 =A0if (missingCount < 0) { >> + =A0 =A0 =A0 =A0missingCount =3D getFieldMissingCount(searcher,docs,fie= ldName); >> + =A0 =A0 =A0} >> + =A0 =A0 =A0res.add(null, missingCount); >> =A0 =A0 } >> >> =A0 =A0 return res; >> >> Modified: >> lucene/dev/trunk/solr/src/java/org/apache/solr/request/UnInvertedField.j= ava >> URL: >> http://svn.apache.org/viewvc/lucene/dev/trunk/solr/src/java/org/apache/s= olr/request/UnInvertedField.java?rev=3D992382&r1=3D992381&r2=3D992382&view= =3Ddiff >> >> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D >> --- >> lucene/dev/trunk/solr/src/java/org/apache/solr/request/UnInvertedField.j= ava >> (original) >> +++ >> lucene/dev/trunk/solr/src/java/org/apache/solr/request/UnInvertedField.j= ava >> Fri Sep =A03 17:12:26 2010 >> @@ -37,6 +37,7 @@ import org.apache.solr.core.SolrCore; >> =A0import org.apache.solr.schema.FieldType; >> =A0import org.apache.solr.schema.TrieField; >> =A0import org.apache.solr.search.*; >> +import org.apache.solr.util.LongPriorityQueue; >> =A0import org.apache.solr.util.PrimUtils; >> =A0import org.apache.solr.util.BoundedTreeSet; >> =A0import org.apache.solr.handler.component.StatsValues; >> @@ -470,7 +471,9 @@ public class UnInvertedField { >> =A0 =A0 if (baseSize >=3D mincount) { >> >> =A0 =A0 =A0 final int[] index =3D this.index; >> - =A0 =A0 =A0final int[] counts =3D new int[numTermsInField]; >> + =A0 =A0 =A0// tricky: we add more more element than we need because we= will >> reuse this array later >> + =A0 =A0 =A0// for ordering term ords before converting to term labels. >> + =A0 =A0 =A0final int[] counts =3D new int[numTermsInField + 1]; >> >> =A0 =A0 =A0 // >> =A0 =A0 =A0 // If there is prefix, find it's start and end term numbers >> @@ -575,7 +578,8 @@ public class UnInvertedField { >> =A0 =A0 =A0 if (sort.equals(FacetParams.FACET_SORT_COUNT) || >> sort.equals(FacetParams.FACET_SORT_COUNT_LEGACY)) { >> =A0 =A0 =A0 =A0 int maxsize =3D limit>0 ? offset+limit : Integer.MAX_VAL= UE-1; >> =A0 =A0 =A0 =A0 maxsize =3D Math.min(maxsize, numTermsInField); >> - =A0 =A0 =A0 =A0final BoundedTreeSet queue =3D new >> BoundedTreeSet(maxsize); >> + =A0 =A0 =A0 =A0LongPriorityQueue queue =3D new >> LongPriorityQueue(Math.min(maxsize,1000), maxsize, Long.MIN_VALUE); >> + >> =A0 =A0 =A0 =A0 int min=3Dmincount-1; =A0// the smallest value in the to= p 'N' values >> =A0 =A0 =A0 =A0 for (int i=3DstartTerm; i> =A0 =A0 =A0 =A0 =A0 int c =3D doNegative ? maxTermCounts[i] - counts[i] = : counts[i]; >> @@ -584,55 +588,63 @@ public class UnInvertedField { >> =A0 =A0 =A0 =A0 =A0 =A0 // index order, so we already know that the keys= are ordered. >> =A0This can be very >> =A0 =A0 =A0 =A0 =A0 =A0 // important if a lot of the counts are repeated= (like zero >> counts would be). >> >> - =A0 =A0 =A0 =A0 =A0 =A0// minimize object creation and speed compariso= n by creating >> a long that >> - =A0 =A0 =A0 =A0 =A0 =A0// encompasses both count and term number. >> - =A0 =A0 =A0 =A0 =A0 =A0// Since smaller values are kept in the TreeSet= , make higher >> counts smaller. >> - =A0 =A0 =A0 =A0 =A0 =A0// >> - =A0 =A0 =A0 =A0 =A0 =A0// =A0 for equal counts, lower term numbers >> - =A0 =A0 =A0 =A0 =A0 =A0// should come first and hence be "greater" >> - >> - =A0 =A0 =A0 =A0 =A0 =A0//long pair =3D (((long)c)<<32) | (0x7fffffff-i= ) ; =A0 // use if >> priority queue >> - =A0 =A0 =A0 =A0 =A0 =A0long pair =3D (((long)-c)<<32) | i; >> - =A0 =A0 =A0 =A0 =A0 =A0queue.add(new Long(pair)); >> - =A0 =A0 =A0 =A0 =A0 =A0if (queue.size()>=3Dmaxsize) >> min=3D-(int)(queue.last().longValue() >>> 32); >> + =A0 =A0 =A0 =A0 =A0 =A0// smaller term numbers sort higher, so subtrac= t the term >> number instead >> + =A0 =A0 =A0 =A0 =A0 =A0long pair =3D (((long)c)<<32) + (Integer.MAX_VA= LUE - i); >> + =A0 =A0 =A0 =A0 =A0 =A0boolean displaced =3D queue.insert(pair); >> + =A0 =A0 =A0 =A0 =A0 =A0if (displaced) min=3D(int)(queue.top() >>> 32); >> =A0 =A0 =A0 =A0 =A0 } >> =A0 =A0 =A0 =A0 } >> + >> =A0 =A0 =A0 =A0 // now select the right page from the results >> >> + =A0 =A0 =A0 =A0// if we are deep paging, we don't have to order the hi= ghest >> "offset" counts. >> + =A0 =A0 =A0 =A0int collectCount =3D Math.max(0, queue.size() - off); >> + =A0 =A0 =A0 =A0assert collectCount < lim; >> + >> + =A0 =A0 =A0 =A0// the start and end indexes of our list "sorted" (star= ting with >> the highest value) >> + =A0 =A0 =A0 =A0int sortedIdxStart =3D queue.size() - (collectCount - 1= ); >> + =A0 =A0 =A0 =A0int sortedIdxEnd =3D queue.size() + 1; >> + =A0 =A0 =A0 =A0final long[] sorted =3D queue.sort(collectCount); >> >> - =A0 =A0 =A0 =A0final int[] tnums =3D new int[Math.min(Math.max(0, >> queue.size()-off), lim)]; >> =A0 =A0 =A0 =A0 final int[] indirect =3D counts; =A0// reuse the counts = array for the >> index into the tnums array >> - =A0 =A0 =A0 =A0assert indirect.length >=3D tnums.length; >> - >> - =A0 =A0 =A0 =A0int tnumCount =3D 0; >> + =A0 =A0 =A0 =A0assert indirect.length >=3D sortedIdxEnd; >> + >> + =A0 =A0 =A0 =A0for (int i=3DsortedIdxStart; i> + =A0 =A0 =A0 =A0 =A0long pair =3D sorted[i]; >> + =A0 =A0 =A0 =A0 =A0int c =3D (int)(pair >>> 32); >> + =A0 =A0 =A0 =A0 =A0int tnum =3D Integer.MAX_VALUE - (int)pair; >> + >> + =A0 =A0 =A0 =A0 =A0indirect[i] =3D i; =A0 // store the index for indir= ect sorting >> + =A0 =A0 =A0 =A0 =A0sorted[i] =3D tnum; =A0// reuse the "sorted" array = to store the >> term numbers for indirect sorting >> >> - =A0 =A0 =A0 =A0for (Long p : queue) { >> - =A0 =A0 =A0 =A0 =A0if (--off>=3D0) continue; >> - =A0 =A0 =A0 =A0 =A0if (--lim<0) break; >> - =A0 =A0 =A0 =A0 =A0int c =3D -(int)(p.longValue() >>> 32); >> - =A0 =A0 =A0 =A0 =A0//int tnum =3D 0x7fffffff - (int)p.longValue(); =A0= // use if >> priority queue >> - =A0 =A0 =A0 =A0 =A0int tnum =3D (int)p.longValue(); >> - =A0 =A0 =A0 =A0 =A0indirect[tnumCount] =3D tnumCount; >> - =A0 =A0 =A0 =A0 =A0tnums[tnumCount++] =3D tnum; >> - =A0 =A0 =A0 =A0 =A0// String label =3D ft.indexedToReadable(getTermTex= t(te, tnum)); >> =A0 =A0 =A0 =A0 =A0 // add a null label for now... we'll fill it in late= r. >> =A0 =A0 =A0 =A0 =A0 res.add(null, c); >> =A0 =A0 =A0 =A0 } >> >> =A0 =A0 =A0 =A0 // now sort the indexes by the term numbers >> - =A0 =A0 =A0 =A0PrimUtils.sort(0, tnumCount, indirect, new >> PrimUtils.IntComparator() { >> + =A0 =A0 =A0 =A0PrimUtils.sort(sortedIdxStart, sortedIdxEnd, indirect, = new >> PrimUtils.IntComparator() { >> =A0 =A0 =A0 =A0 =A0 @Override >> =A0 =A0 =A0 =A0 =A0 public int compare(int a, int b) { >> - =A0 =A0 =A0 =A0 =A0 =A0return tnums[a] - tnums[b]; >> + =A0 =A0 =A0 =A0 =A0 =A0return (int)sorted[a] - (int)sorted[b]; >> + =A0 =A0 =A0 =A0 =A0} >> + >> + =A0 =A0 =A0 =A0 =A0@Override >> + =A0 =A0 =A0 =A0 =A0public boolean lessThan(int a, int b) { >> + =A0 =A0 =A0 =A0 =A0 =A0return sorted[a] < sorted[b]; >> + =A0 =A0 =A0 =A0 =A0} >> + >> + =A0 =A0 =A0 =A0 =A0@Override >> + =A0 =A0 =A0 =A0 =A0public boolean equals(int a, int b) { >> + =A0 =A0 =A0 =A0 =A0 =A0return sorted[a] =3D=3D sorted[b]; >> =A0 =A0 =A0 =A0 =A0 } >> =A0 =A0 =A0 =A0 }); >> >> =A0 =A0 =A0 =A0 // convert the term numbers to term values and set as th= e label >> - =A0 =A0 =A0 =A0for (int i=3D0; i> + =A0 =A0 =A0 =A0for (int i=3DsortedIdxStart; i> =A0 =A0 =A0 =A0 =A0 int idx =3D indirect[i]; >> - =A0 =A0 =A0 =A0 =A0int tnum =3D tnums[idx]; >> - =A0 =A0 =A0 =A0 =A0String label =3D getReadableValue(getTermValue(te, = tnum), ft, >> spare); >> - =A0 =A0 =A0 =A0 =A0res.setName(idx, label); >> + =A0 =A0 =A0 =A0 =A0int tnum =3D (int)sorted[idx]; >> + =A0 =A0 =A0 =A0 =A0String label =3D getReadableValue(getTermValue(te, = tnum), ft, >> spare); >> + =A0 =A0 =A0 =A0 =A0res.setName(idx - sortedIdxStart, label); >> =A0 =A0 =A0 =A0 } >> >> =A0 =A0 =A0 } else { >> >> Added: >> lucene/dev/trunk/solr/src/java/org/apache/solr/util/LongPriorityQueue.ja= va >> URL: >> http://svn.apache.org/viewvc/lucene/dev/trunk/solr/src/java/org/apache/s= olr/util/LongPriorityQueue.java?rev=3D992382&view=3Dauto >> >> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D >> --- >> lucene/dev/trunk/solr/src/java/org/apache/solr/util/LongPriorityQueue.ja= va >> (added) >> +++ >> lucene/dev/trunk/solr/src/java/org/apache/solr/util/LongPriorityQueue.ja= va >> Fri Sep =A03 17:12:26 2010 >> @@ -0,0 +1,235 @@ >> +package org.apache.solr.util; >> + >> +import java.util.Arrays; >> + >> +/** >> + * Licensed to the Apache Software Foundation (ASF) under one or more >> + * contributor license agreements. =A0See the NOTICE file distributed w= ith >> + * this work for additional information regarding copyright ownership. >> + * The ASF licenses this file to You under the Apache License, Version >> 2.0 >> + * (the "License"); you may not use this file except in compliance with >> + * the License. =A0You may obtain a copy of the License at >> + * >> + * =A0 =A0 http://www.apache.org/licenses/LICENSE-2.0 >> + * >> + * Unless required by applicable law or agreed to in writing, software >> + * distributed under the License is distributed on an "AS IS" BASIS, >> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or >> implied. >> + * See the License for the specific language governing permissions and >> + * limitations under the License. >> + */ >> + >> +/** A native long priority queue. >> + * >> + * @lucene.internal >> +*/ >> +public class LongPriorityQueue { >> + =A0protected int size; =A0 =A0 =A0 =A0 =A0 =A0 // number of elements c= urrently in the >> queue >> + =A0protected int currentCapacity; =A0// number of elements the queue c= an >> hold w/o expanding >> + =A0protected int maxSize; =A0 =A0 =A0 =A0 =A0// max number of elements= allowed in >> the queue >> + =A0protected long[] heap; >> + =A0protected final long sentinel; =A0 // represents a null return valu= e >> + >> + =A0public LongPriorityQueue(int initialSize, int maxSize, long sentine= l) { >> + =A0 =A0this.maxSize =3D maxSize; >> + =A0 =A0this.sentinel =3D sentinel; >> + =A0 =A0initialize(initialSize); >> + =A0} >> + >> + >> + =A0protected void initialize(int sz) { >> + =A0 =A0int heapSize; >> + =A0 =A0if (0 =3D=3D sz) >> + =A0 =A0 =A0// We allocate 1 extra to avoid if statement in top() >> + =A0 =A0 =A0heapSize =3D 2; >> + =A0 =A0else { >> + =A0 =A0 =A0// NOTE: we add +1 because all access to heap is >> + =A0 =A0 =A0// 1-based not 0-based. =A0heap[0] is unused. >> + =A0 =A0 =A0heapSize =3D Math.max(sz, sz + 1); // handle overflow >> + =A0 =A0} >> + =A0 =A0heap =3D new long[heapSize]; >> + =A0 =A0currentCapacity =3D sz; >> + =A0} >> + >> + =A0public int getCurrentCapacity() { >> + =A0 =A0return currentCapacity; >> + =A0} >> + >> + =A0public void resize(int sz) { >> + =A0 =A0int heapSize; >> + =A0 =A0if (sz > maxSize) { >> + =A0 =A0 =A0maxSize =3D sz; >> + =A0 =A0} >> + =A0 =A0if (0 =3D=3D sz) >> + =A0 =A0 =A0// We allocate 1 extra to avoid if statement in top() >> + =A0 =A0 =A0heapSize =3D 2; >> + =A0 =A0else { >> + =A0 =A0 =A0heapSize =3D Math.max(sz, sz + 1); // handle overflow >> + =A0 =A0} >> + =A0 =A0heap =3D Arrays.copyOf(heap, heapSize); >> + =A0 =A0currentCapacity =3D sz; >> + =A0} >> + >> + =A0/** >> + =A0 * Adds an object to a PriorityQueue in log(size) time. If one trie= s to >> add >> + =A0 * more objects than maxSize from initialize an >> + =A0 * {@link ArrayIndexOutOfBoundsException} is thrown. >> + =A0 * >> + =A0 * @return the new 'top' element in the queue. >> + =A0 */ >> + =A0public long add(long element) { >> + =A0 =A0if (size >=3D currentCapacity) { >> + =A0 =A0 =A0int newSize =3D Math.min(currentCapacity <<1, maxSize); >> + =A0 =A0 =A0if (newSize < currentCapacity) newSize =3D Integer.MAX_VALU= E; =A0// >> handle overflow >> + =A0 =A0 =A0resize(newSize); >> + =A0 =A0} >> + =A0 =A0size++; >> + =A0 =A0heap[size] =3D element; >> + =A0 =A0upHeap(); >> + =A0 =A0return heap[1]; >> + =A0} >> + >> + /** >> + =A0 * Adds an object to a PriorityQueue in log(size) time. If one trie= s to >> add >> + =A0 * more objects than the current capacity, an >> + =A0 * {@link ArrayIndexOutOfBoundsException} is thrown. >> + =A0 */ >> + =A0public void addNoCheck(long element) { >> + =A0 =A0++size; >> + =A0 =A0heap[size] =3D element; >> + =A0 =A0upHeap(); >> + =A0} >> + >> + =A0/** >> + =A0 * Adds an object to a PriorityQueue in log(size) time. >> + =A0 * It returns the smallest object (if any) that was >> + =A0 * dropped off the heap because it was full, or >> + =A0 * the sentinel value. >> + =A0 * >> + =A0 * =A0This can be >> + =A0 * the given parameter (in case it is smaller than the >> + =A0 * full heap's minimum, and couldn't be added), or another >> + =A0 * object that was previously the smallest value in the >> + =A0 * heap and now has been replaced by a larger one, or null >> + =A0 * if the queue wasn't yet full with maxSize elements. >> + =A0 */ >> + =A0public long insertWithOverflow(long element) { >> + =A0 =A0if (size < maxSize) { >> + =A0 =A0 =A0add(element); >> + =A0 =A0 =A0return sentinel; >> + =A0 =A0} else if (element > heap[1]) { >> + =A0 =A0 =A0long ret =3D heap[1]; >> + =A0 =A0 =A0heap[1] =3D element; >> + =A0 =A0 =A0updateTop(); >> + =A0 =A0 =A0return ret; >> + =A0 =A0} else { >> + =A0 =A0 =A0return element; >> + =A0 =A0} >> + =A0} >> + >> + =A0/** inserts the element and returns true if this element caused ano= ther >> element >> + =A0 * to be dropped from the queue. */ >> + =A0public boolean insert(long element) { >> + =A0 =A0if (size < maxSize) { >> + =A0 =A0 =A0add(element); >> + =A0 =A0 =A0return false; >> + =A0 =A0} else if (element > heap[1]) { >> + =A0 =A0 =A0// long ret =3D heap[1]; >> + =A0 =A0 =A0heap[1] =3D element; >> + =A0 =A0 =A0updateTop(); >> + =A0 =A0 =A0return true; >> + =A0 =A0} else { >> + =A0 =A0 =A0return false; >> + =A0 =A0} >> + =A0} >> + >> + =A0/** Returns the least element of the PriorityQueue in constant time= . */ >> + =A0public long top() { >> + =A0 =A0return heap[1]; >> + =A0} >> + >> + =A0/** Removes and returns the least element of the PriorityQueue in >> log(size) >> + =A0 =A0time. =A0Only valid if size() > 0. >> + =A0 */ >> + =A0public long pop() { >> + =A0 =A0long result =3D heap[1]; =A0 =A0 =A0 =A0 =A0 =A0 =A0 // save fi= rst value >> + =A0 =A0heap[1] =3D heap[size]; =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0// move = last to first >> + =A0 =A0size--; >> + =A0 =A0downHeap(); =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 =A0 =A0// adjust heap >> + =A0 =A0return result; >> + =A0} >> + >> + =A0/** >> + =A0 * Should be called when the Object at top changes values. >> + =A0 * @return the new 'top' element. >> + =A0 */ >> + =A0public long updateTop() { >> + =A0 =A0downHeap(); >> + =A0 =A0return heap[1]; >> + =A0} >> + >> + =A0/** Returns the number of elements currently stored in the >> PriorityQueue. */ >> + =A0public int size() { >> + =A0 =A0return size; >> + =A0} >> + >> + =A0/** Returns the array used to hold the heap, with the smallest item= at >> array[1] >> + =A0 * =A0and the last (but not necessarily largest) at array[size()]. = =A0This >> is *not* >> + =A0 * =A0fully sorted. >> + =A0 */ >> + =A0public long[] getInternalArray() { >> + =A0 =A0return heap; >> + =A0} >> + >> + =A0/** Pops the smallest n items from the heap, placing them in the >> internal array at >> + =A0 * =A0arr[size] through arr[size-(n-1)] with the smallest (first el= ement >> popped) >> + =A0 * =A0being at arr[size]. =A0The internal array is returned. >> + =A0 */ >> + =A0public long[] sort(int n) { >> + =A0 =A0while (--n >=3D 0) { >> + =A0 =A0 =A0long result =3D heap[1]; =A0 =A0 =A0 =A0 =A0 =A0 // save fi= rst value >> + =A0 =A0 =A0heap[1] =3D heap[size]; =A0 =A0 =A0 =A0 =A0 =A0 =A0// move = last to first >> + =A0 =A0 =A0heap[size] =3D result; =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0/= / place it last >> + =A0 =A0 =A0size--; >> + =A0 =A0 =A0downHeap(); =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 =A0 =A0// adjust heap >> + =A0 =A0} >> + =A0 =A0return heap; >> + =A0} >> + >> + =A0/** Removes all entries from the PriorityQueue. */ >> + =A0public void clear() { >> + =A0 =A0size =3D 0; >> + =A0} >> + >> + =A0private void upHeap() { >> + =A0 =A0int i =3D size; >> + =A0 =A0long node =3D heap[i]; =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 // save bottom node >> + =A0 =A0int j =3D i >>> 1; >> + =A0 =A0while (j > 0 && node < heap[j]) { >> + =A0 =A0 =A0heap[i] =3D heap[j]; =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 // shift parents down >> + =A0 =A0 =A0i =3D j; >> + =A0 =A0 =A0j =3D j >>> 1; >> + =A0 =A0} >> + =A0 =A0heap[i] =3D node; =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0// install saved node >> + =A0} >> + >> + =A0private void downHeap() { >> + =A0 =A0int i =3D 1; >> + =A0 =A0long node =3D heap[i]; =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 // save top node >> + =A0 =A0int j =3D i << 1; =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0// find smaller child >> + =A0 =A0int k =3D j + 1; >> + =A0 =A0if (k <=3D size && heap[k] < heap[j]) { >> + =A0 =A0 =A0j =3D k; >> + =A0 =A0} >> + =A0 =A0while (j <=3D size && heap[j] < node) { >> + =A0 =A0 =A0heap[i] =3D heap[j]; =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 // shift up child >> + =A0 =A0 =A0i =3D j; >> + =A0 =A0 =A0j =3D i << 1; >> + =A0 =A0 =A0k =3D j + 1; >> + =A0 =A0 =A0if (k <=3D size && heap[k] < heap[j]) { >> + =A0 =A0 =A0 =A0j =3D k; >> + =A0 =A0 =A0} >> + =A0 =A0} >> + =A0 =A0heap[i] =3D node; =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0// install saved node >> + =A0} >> +} >> >> Propchange: >> lucene/dev/trunk/solr/src/java/org/apache/solr/util/LongPriorityQueue.ja= va >> >> ------------------------------------------------------------------------= ------ >> =A0 =A0svn:eol-style =3D native >> >> Propchange: >> lucene/dev/trunk/solr/src/java/org/apache/solr/util/LongPriorityQueue.ja= va >> >> ------------------------------------------------------------------------= ------ >> =A0 =A0svn:executable =3D * >> >> Propchange: >> lucene/dev/trunk/solr/src/java/org/apache/solr/util/LongPriorityQueue.ja= va >> >> ------------------------------------------------------------------------= ------ >> =A0 =A0svn:keywords =3D Date Author Id Revision HeadURL >> >> Modified: >> lucene/dev/trunk/solr/src/test/org/apache/solr/util/PrimUtilsTest.java >> URL: >> http://svn.apache.org/viewvc/lucene/dev/trunk/solr/src/test/org/apache/s= olr/util/PrimUtilsTest.java?rev=3D992382&r1=3D992381&r2=3D992382&view=3Ddif= f >> >> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D >> --- lucene/dev/trunk/solr/src/test/org/apache/solr/util/PrimUtilsTest.ja= va >> (original) >> +++ lucene/dev/trunk/solr/src/test/org/apache/solr/util/PrimUtilsTest.ja= va >> Fri Sep =A03 17:12:26 2010 >> @@ -51,4 +51,44 @@ public class PrimUtilsTest extends Lucen >> =A0 =A0 } >> =A0 } >> >> + =A0public void testLongPriorityQueue() { >> + =A0 =A0int maxSize =3D 100; >> + =A0 =A0long[] a =3D new long[maxSize]; >> + =A0 =A0long[] discards =3D new long[maxSize]; >> + >> + =A0 =A0for (int iter=3D0; iter<100; iter++) { >> + =A0 =A0 =A0int discardCount =3D 0; >> + =A0 =A0 =A0int startSize =3D r.nextInt(maxSize) + 1; >> + =A0 =A0 =A0int endSize =3D startSize=3D=3DmaxSize ? maxSize : startSiz= e + >> r.nextInt(maxSize-startSize); >> + =A0 =A0 =A0int adds =3D r.nextInt(maxSize+1); >> + =A0 =A0 =A0// System.out.println("startSize=3D" + startSize + " endSiz= e=3D" + >> endSize + " adds=3D"+adds); >> + =A0 =A0 =A0LongPriorityQueue pq =3D new LongPriorityQueue(startSize, e= ndSize, >> Long.MIN_VALUE); >> + >> + =A0 =A0 =A0for (int i=3D0; i> + =A0 =A0 =A0 =A0long v =3D r.nextLong(); >> + =A0 =A0 =A0 =A0a[i] =3D v; >> + =A0 =A0 =A0 =A0long out =3D pq.insertWithOverflow(v); >> + =A0 =A0 =A0 =A0if (i < endSize) { >> + =A0 =A0 =A0 =A0 =A0assertEquals(out, Long.MIN_VALUE); >> + =A0 =A0 =A0 =A0} else { >> + =A0 =A0 =A0 =A0 =A0discards[discardCount++] =3D out; >> + =A0 =A0 =A0 =A0} >> + =A0 =A0 =A0} >> + =A0 =A0 =A0assertEquals(Math.min(adds,endSize), pq.size()); >> + =A0 =A0 =A0assertEquals(adds, pq.size() + discardCount); >> + >> + =A0 =A0 =A0Arrays.sort(a, 0, adds); >> + =A0 =A0 =A0Arrays.sort(discards, 0, discardCount); >> + =A0 =A0 =A0for (int i=3D0; i> + =A0 =A0 =A0 =A0assertEquals(a[i], discards[i]); >> + =A0 =A0 =A0} >> + >> + =A0 =A0 =A0for (int i=3DdiscardCount; i> + =A0 =A0 =A0 =A0assertEquals(a[i], pq.pop()); >> + =A0 =A0 =A0} >> + >> + =A0 =A0 =A0assertEquals(0, pq.size()); >> + =A0 =A0} >> + =A0} >> + >> =A0} >> \ No newline at end of file >> >> > > > > -- > Robert Muir > rcmuir@gmail.com > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org