lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yonik Seeley <ysee...@gmail.com>
Subject Re: wildcards for /export
Date Fri, 18 Nov 2016 01:49:22 GMT
On Thu, Nov 17, 2016 at 8:12 PM, Erick Erickson <erickerickson@gmail.com> wrote:
> Yonik:
>
> Hmmm, we may be closer to that than it might appear. I happened to
> need to do some verification yesterday to determine whether I could
> limit the number of rows returned with TupleStream variants. /export
> of course doesn't do that, the close on a TupleStream waits until the
> entire stream is exhausted and throws the bits on the floor.
>
> Anyway, I was playing around with returning 10M rows with the /query
> and /export handlers and found out that I could indeed use /query and
> limit the rows. Fine so far.
>
> Then just for yucks I decided to try to use the /query handler with
> rows=100M and... the total processing time was virtually identical to
> /export. These weren't very sophisticated tests mind you; they did
> lend evidence that your idea is probably the way to go though.

When I did some ad-hoc tests a long time ago, /select was inexplicably
much slower (even when retrieving all docvalues and discounting
sorting time).
Some of the issue was probably a bug was fixed recently in
SolrIndexSearcher.decorateSomethingOrOther that was creating a
top-level DV view.

Some other changes off the top of my head:
- if the number of docs being retrieved is very large (or all via
rows=-1), and if no other components (like highlighting) need the
top-N docs (needDocList), then defer sorting of the matches until
later.
- keep track of the DocSet on the ResponseBuilder (this is already
done when we facet via needDocSet?)
- if sorting was deferred, then sort in the most efficient way we know
how (i.e. don't always use a priority queue), or we can just do it
like export writer currently does.
- Invert the logic that writes DV fields so that we figure out the
fields once, look up the docvalues once, and then efficiently write
them out per document (Noble's addition of PushWriter is the right
direction here).

In the long run, this should be simpler to deal with from both a "fl"
point of view, as well as augmenters, pseudo-fields, and security.

But again, feel free to add whatever to /export in the meantime... I'm
just laying out a bigger picture in case anyone also wants to work
toward that as well.

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message