lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hoss Man (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SOLR-5463) Provide cursor/token based "searchAfter" support that works with arbitrary sorting (ie: "deep paging")
Date Mon, 09 Dec 2013 23:22:08 GMT

     [ https://issues.apache.org/jira/browse/SOLR-5463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Hoss Man updated SOLR-5463:
---------------------------

    Attachment: SOLR-5463__straw_man.patch

Ok, updated patch making the change in user semantics I mentioned wanting to try last week.
 Best way to explain it is with a walk through of a simple example (note: if you try the current
strawman code, the "numFound" and "start" values returned in the docList don't match what
i've pasted in the examples below -- these examples show what the final results should look
like in the finished solution)

Initial requests using searchAfter should always start with a totem value of "{{\*}}"

{code:title=http://localhost:8983/solr/deep?q=*:*&rows=20&sort=id+desc&searchAfter=*}
{
  "responseHeader":{
    "status":0,
    "QTime":2},
  "response":{"numFound":32,"start":-1,"docs":[
      // ...20 docs here...
    ]
  },
  "nextSearchAfter":"AoEjTk9L"}
{code}

The {{nextSearchAfter}} token returned by this request tells us what to use in the second
request...

{code:title=http://localhost:8983/solr/deep?q=*:*&rows=20&sort=id+desc&searchAfter=AoEjTk9L}
{
  "responseHeader":{
    "status":0,
    "QTime":7},
  "response":{"numFound":32,"start":-1,"docs":[
      // ...12 docs here...
    ]
  },
  "nextSearchAfter":"AoEoMDU3OUIwMDI="}
{code}

Since this result block contains fewer rows then were requested, the client could automatically
stop, but the {{nextSearchAfter}} is still returned, and it's still safe to request a subsequent
page (this is the fundemental diff from the previous patches, where {{nextSearchAfter}} was
set to {{null}} anytime the code could tell there were no more results ...

{code:title=http://localhost:8983/solr/deep?q=*:*&wt=json&indent=true&rows=20&fl=id,price&sort=id+desc&searchAfter=AoEoMDU3OUIwMDI=}
{
  "responseHeader":{
    "status":0,
    "QTime":1},
  "response":{"numFound":32,"start":-1,"docs":[]
  },
  "nextSearchAfter":"AoEoMDU3OUIwMDI="}
{code}

Note that in this case, with no docs included in the response, the {{nextSearchAfter}} totem
is the same as the input.

For some sorts this makes it possible for clients to "resume" a full walk of all documents
matching a query -- picking up where they let off if more documents are added to the index
that match (for example: when doing an ascending sort on a numeric uniqueKey field that always
increases as new docs are added, sorting by a timestamp field (asc) indicating when documents
are crawled, etc...)

This also works as you would expect for searches that don't match any documents...

{code:title=http://localhost:8983/solr/deep?q=text:bogus&rows=20&sort=id+desc&searchAfter=*}
{
  "responseHeader":{
    "status":0,
    "QTime":21},
  "response":{"numFound":0,"start":-1,"docs":[]
  },
  "nextSearchAfter":"*"}
{code}


> Provide cursor/token based "searchAfter" support that works with arbitrary sorting (ie:
"deep paging")
> ------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-5463
>                 URL: https://issues.apache.org/jira/browse/SOLR-5463
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Hoss Man
>            Assignee: Hoss Man
>         Attachments: SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch,
SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch,
SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch
>
>
> I'd like to revist a solution to the problem of "deep paging" in Solr, leveraging an
HTTP based API similar to how IndexSearcher.searchAfter works at the lucene level: require
the clients to provide back a token indicating the sort values of the last document seen on
the previous "page".  This is similar to the "cursor" model I've seen in several other REST
APIs that support "pagnation" over a large sets of results (notable the twitter API and it's
"since_id" param) except that we'll want something that works with arbitrary multi-level sort
critera that can be either ascending or descending.
> SOLR-1726 laid some initial ground work here and was commited quite a while ago, but
the key bit of argument parsing to leverage it was commented out due to some problems (see
comments in that issue).  It's also somewhat out of date at this point: at the time it was
commited, IndexSearcher only supported searchAfter for simple scores, not arbitrary field
sorts; and the params added in SOLR-1726 suffer from this limitation as well.
> ---
> I think it would make sense to start fresh with a new issue with a focus on ensuring
that we have deep paging which:
> * supports arbitrary field sorts in addition to sorting by score
> * works in distributed mode



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message