lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Atri Sharma <>
Subject Re: Sampled Queries -- Use Cases and Feedback
Date Mon, 10 Jun 2019 06:53:38 GMT
Any thoughts on this? I am envisioning applications to machine
learning systems, where the training dataset might be a small sample
of the entire dataset, and the user wants scoring to be done only on
samples of the dataset.

On Fri, Jun 7, 2019 at 5:45 PM Atri Sharma <> wrote:
> Hi All,
> While working on a new Query type, I was inclined to think of a couple
> of use cases where the documents being scored need not be all of the
> data set, but a sample of them. This can be useful for very large
> datasets, where a query is only interested in getting the "feel" of
> the data, and other queries where the data is being aggregated over
> time, so a wide enough sample of the data is good enough for the user
> at the tradeoff of improved performance. Faceting already has sampling
> mechanisms, so there are ideas to be borrowed from that part.
> I have some ideas on introducing a new query type and associated
> semantics to allow this functionality to be present from ground up.
> Specifically, a query type which wraps another query and "feeds"
> offsets to the inner query, along with a limit of collection of hits.
> I can go in more detail, but wanted to get some thoughts and feedback
> before delving deeper.
> Atri


Apache Concerted

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message