> Also, I really do not like the large perf hits at highish topN sizes.
> Admittedly it's unusual (though I suspect not THAT rare) to use such
> a large topN, but, I esepically don't like that it doesn't degrade
> gracefully – it's surprising. Likewise I'd like to see with highish
> number of segments how things degrade.
Well the degradation should not be surprising  it's coded into the
subleading complexity: redoing the complexity argument carefully
for the average case, probabilistically, the flop count for single PQ,
for top T docs out of S segments of H hits in each, is^(footnote1)
O(S*H + T*log(T)*log(S*H))
(note that it is a *sum* of the two terms)
while inserting in S different queues is^(footnote2)
O(S*H + T*log(T)*log(H)*S )
This means that multiPQ has a subleading term with
S*log(H) / log(S*H) worse performance, as a function of T.
To see whether this is "bad", we have to consider the full
range of S = numSegments, H = numHitsPerSegment
and T = topNumHitsToReturn.
If you fix S and H (which fixes the dominating factor as
constant also), then as T increases, as the tests show,
multiPQ diverges, getting worse as T goes up.
On the other hand, if you fix T, and instead let S and H
vary, then the flopcount ratio is the following:
flops_multiPQ / flops_singlePQ =
S*H + T*log(T)*log(S*H)
 =
S*H + T*log(T)*log(H)*S
1 + T*log(T)*log(S*H)/S*H
 =
1 + T*log(T)*log(H)*S/S*H
under the always true (for us) case that S*H is the
dominating factor, and using 1/(1+x) =~ 1x, for small
x, we have:
(1 + T*log(T)*log(S*H)/S*H ) * (1  T*log(T)*log(H)/H )
using (1+x)*(1y) = 1+xy for small x, y, and pulling out
the common factor of T*log(T)/H, we have:
1  ( T*log(T)/H ) * ( log(H)  (log(S*H) / S) )
What's the end result? If you fix the queue size (T),
then as the number of hits and the number of hits
per segment grows, the percentage performance difference
should go to *zero* proportional to 1/H =
1/numHitsPerSegment.
This to me is pretty important to note, and should be
pretty easy to see in a test run: doing it on a run with
5 million documents hit by the query should work out
much better for the multiPQ than singlePQ, whereas
*decreasing* the hit count, to say 10,000 hits, should
penalize multiPQ versus singlePQ.
The average user I would imagine doesn't have *tons*
of variation in queuesize or number of segments
(meaning I doubt either grows often above 100, and
very rarely up to 1000), but the variation in total
number of hits is extremely variable, and as even
recent discussions on javauser show, people even
look at multiTB indexes, so performance as a function
of index size is probably the most important.
I can run these numbers on my mac desktop, because
this analysis is independent of hardware or even
java version, since it's a scaling argument, but we should
really see what the results are like on reasonable
hardware for both the small hitcount case (where the
performance might be really bad for singlePQ, and
rule it out altogether), and the very high hitcount case
(where the difference should be much more in favor
of multiPQ, and in fact could rule out singlePQ if the
constant factors (like the cost of the more string
compares that singlePQ has to do) become more
important than the complexity, because the asymptotics
are the same in this regime).
jake

(1) To see this scaling, if you have a queue of size
T, and you have currently looked at X hits, then the
probability in the average case of the next hit beating
the bottom element of the queue is T/X. Then the
total number of insertions in the queue you have to
do in a set of H hits is sum_X=1...H [ T/X ]. Replacing
the sum with an integral, it's just T log(H). The
cost of an insertion into a PQ of size T is log(T), giving
the flop count based on insertions of log(T) * T * log(H)
for insertions while walking H hits, and log(T)*T*log(S*H)
for insertions while walking S*H hits.
(2) we're neglecting the final merge for the multiPQ case,
because its subsubleading in complexity: merging
S sorted lists of T elements is O(T log(S)) on average,
which is smaller than the next bigger term in this counting
by a multiplicative factor of log(T) * log(H).
