lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hoss Man (JIRA)" <>
Subject [jira] [Updated] (SOLR-4455) Stored value of "NOW" differs between replicas
Date Thu, 14 Feb 2013 22:51:14 GMT


Hoss Man updated SOLR-4455:

    Attachment: SOLR-4455.patch

Attaching a patch that adds the logic i was thinking of to DistributedUpdateProcessor.

At first i was confused why none of the existing distributed query tests weren't already failing,
since the test config includes a "timestamp" field -- and then i realized it's because the
"handler" for comparing multiple responses for identical queries is configured to "SKIPVAL"
the timestamp field in most tests.

I updated a lot of the test scafolding to explicitly set a consistent NOW when talking to
both the controlClient and a distributedClient.

In the attached patch, TestDistributedSearch and BasicDistributedZkTest have both been updated
to no longer SKIPVAL the timestamp, and they pss, demontrating that the basics of this test
scaffolding changes and the changes to DistributedUpdateProcessor seem to work.

BasicDistributedZk2Test on the otherhand fails very early and consistently with these changes
and the timestamp SKIPVAL disabled ... with the "nocommit" in place to always force a NOW
value of in the year 2038, you can see from the logs that somehow the cloud copy of doc id=1
is still getting a timestamp of the currenttime, even though the control solr instance gets
the expected value...  i'm not really sure why/how this is happening, because you can see
the NOW value specified in the logs for all the /update requests related to id=1 (even when
forwarded from the leader)


One thing that should be noted is that while typing up these notes, it occured to me that
these changes still might not garuntee consistency in the case of a recovery situation that
results in replaying the transaction log -- in which case the _documents_ are recorded, but
not all of the update request params like NOW.

I'm not certain if this is causing the BasicDistributedZk2Test failures mentioned above --
but it's certianly possible (i do see mentions in the logs of "Log replay finished. recoveryInfo=RecoveryInfo{adds=1
...", but it's not clear to me why any recovery would be happening ... nothing jumps out at
me in this test to suggest that anything is aborting nodes to force recovery.

> Stored value of "NOW" differs between replicas
> ----------------------------------------------
>                 Key: SOLR-4455
>                 URL:
>             Project: Solr
>          Issue Type: Bug
>          Components: update
>    Affects Versions: 4.1
>            Reporter: Colin Bartolome
>            Assignee: Hoss Man
>            Priority: Minor
>         Attachments: SOLR-4455.patch
> I have a field in {{schema.xml}} defined like this:
> {code:xml}
> <field name="timestamp" type="date" indexed="true" stored="true" default="NOW" />
> {code}
> When I perform a query that's load-balanced across the servers in my cloud, the value
stored in that field differs slightly between each replica for the same returned document.
> I haven't seen this field differ by more than a tenth of a second and I'm not running
queries against it, but I can picture a situation where somebody has one replica returning
23:59:59.990 and another returning 00:00:00.010 and a query starts behaving oddly.
> It seems like the leader should evaluate what "NOW" means and the replicas should copy
that value.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message