lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alexander S. (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SOLR-12363) Duplicates with random search, cursors, and fixed seed
Date Wed, 16 May 2018 12:08:00 GMT

     [ https://issues.apache.org/jira/browse/SOLR-12363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Alexander S. updated SOLR-12363:
--------------------------------
    Description: 
We do have a SolrCloud cluster and just updated one of our views to use cursors with the
random order. Our goal was to use an infinite scroll with the random ordering so we can shuffle results
once every 24 hours.

To do so we save the seed that we use in our random order to the cookies with the 24 hours expiration
period, which didn't work as expected:
 # Results are shuffled with every request (every time we pass the initial cursor value "*"
and the same random value for ordering we already used).
 # Results contain duplicates sometimes. Not a lot of them, but from time to time they appear.

In our *schema.xml* we have:
{code:java}
<fieldType name="rand" class="solr.RandomSortField" omitNorms="true"/>
<dynamicField name="random_*" stored="false" type="rand" multiValued="false" indexed="true"/>{code}
In our search requests, we order by *random_123 asc, id asc*, where *123* is the seed from
cookies.

Here is the page [https://awards.wegohealth.com/nominees]

-Even when I try to get the "next page" URL from google chrome developer console and open
it in separate tabs it yields different results: [https://awards.wegohealth.com/nominees?cursor=AoJYmYbyATRBd2FyZDo6Tm9taW5lZSAxMzI0Mg%3D%3D]-

So it feels like the seed parameter we use is ignored or every shard understands it differently,
not sure.

On the screenshots, you can see the URL is the same and results are different.

  was:
We do have a SolrCloud cluster and just updated one of our views to use cursors with the
random order. Our goal was to use an infinite scroll with the random ordering so we can shuffle results
once every 24 hours.

To do so we save the seed that we use in our random order to the cookies with the 24 hours expiration
period, which didn't work as expected:
 # Results are shuffled with every request (every time we pass the initial cursor value "*"
and the same random value for ordering we already used).
 # Results contain duplicates sometimes. Not a lot of them, but from time to time they appear.

In our *schema.xml* we have:
{code:java}
<fieldType name="rand" class="solr.RandomSortField" omitNorms="true"/>
<dynamicField name="random_*" stored="false" type="rand" multiValued="false" indexed="true"/>{code}
In our search requests, we order by *random_123 asc, id asc*, where *123* is the seed from
cookies.

Here is the page [https://awards.wegohealth.com/nominees]

Even when I try to get the "next page" URL from google chrome developer console and open it
in separate tabs it yields different results: [https://awards.wegohealth.com/nominees?cursor=AoJYmYbyATRBd2FyZDo6Tm9taW5lZSAxMzI0Mg%3D%3D]

So it feels like the seed parameter we use is ignored or every shard understands it differently,
not sure.

On the screenshots, you can see the URL is the same and results are different.


> Duplicates with random search, cursors, and fixed seed
> ------------------------------------------------------
>
>                 Key: SOLR-12363
>                 URL: https://issues.apache.org/jira/browse/SOLR-12363
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>    Affects Versions: 5.3.1
>            Reporter: Alexander S.
>            Priority: Major
>         Attachments: Screen shot 2018-05-16 at 14.51.19.png, Screen shot 2018-05-16 at
14.51.23.png, Screen shot 2018-05-16 at 14.51.26.png
>
>
> We do have a SolrCloud cluster and just updated one of our views to use cursors with the
random order. Our goal was to use an infinite scroll with the random ordering so we can shuffle results
once every 24 hours.
> To do so we save the seed that we use in our random order to the cookies with the 24
hours expiration period, which didn't work as expected:
>  # Results are shuffled with every request (every time we pass the initial cursor value
"*" and the same random value for ordering we already used).
>  # Results contain duplicates sometimes. Not a lot of them, but from time to time they
appear.
> In our *schema.xml* we have:
> {code:java}
> <fieldType name="rand" class="solr.RandomSortField" omitNorms="true"/>
> <dynamicField name="random_*" stored="false" type="rand" multiValued="false" indexed="true"/>{code}
> In our search requests, we order by *random_123 asc, id asc*, where *123* is the seed
from cookies.
> Here is the page [https://awards.wegohealth.com/nominees]
> -Even when I try to get the "next page" URL from google chrome developer console and
open it in separate tabs it yields different results: [https://awards.wegohealth.com/nominees?cursor=AoJYmYbyATRBd2FyZDo6Tm9taW5lZSAxMzI0Mg%3D%3D]-
> So it feels like the seed parameter we use is ignored or every shard understands it
differently, not sure.
> On the screenshots, you can see the URL is the same and results are different.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message