lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amrit Sarkar <sarkaramr...@gmail.com>
Subject Re: Solr 7.2.0 CDCR Issue with TLOG collections
Date Wed, 07 Mar 2018 16:28:49 GMT
Webster,

I updated the JIRA: *SOLR-12057
<https://issues.apache.org/jira/browse/SOLR-12057>, **CdcrUpdateProcessor*
has a hack, it enable *PEER_SYNC* to bypass the leader logic in
*DistributedUpdateProcessor.versionAdd,* which eventually ends up in
segments not getting created.

I wrote a very dirty patch which fixes the problem with basic tests to
prove it works. I will try to polish and finish this as soon as possible.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Medium: https://medium.com/@sarkaramrit2

On Tue, Mar 6, 2018 at 10:07 PM, Webster Homer <webster.homer@sial.com>
wrote:

> seems that this is a bug in Solr
> https://issues.apache.org/jira/browse/SOLR-12057
>
> Hopefully it can be addressed soon!
>
> On Mon, Mar 5, 2018 at 4:14 PM, Webster Homer <webster.homer@sial.com>
> wrote:
>
> > I noticed that the cdcr action=queues returns different results for the
> > target clouds. One target says that the  updateLogSynchronizer  is
> > stopped the other says started. Why? What does that mean. We don't
> > explicitly set that anywhere
> >
> >
> > {"responseHeader": {"status": 0,"QTime": 0},"queues": [],"tlogTotalSize":
> > 0,"tlogTotalCount": 0,"updateLogSynchronizer": "stopped"}
> >
> > and the other
> >
> > {"responseHeader": {"status": 0,"QTime": 0},"queues": [],"tlogTotalSize":
> > 22254206389,"tlogTotalCount": 2,"updateLogSynchronizer": "started"}
> >
> > The source is as follows:
> > {
> > "responseHeader": {
> > "status": 0,
> > "QTime": 5
> > },
> > "queues": [
> > "xxx-mzk01.sial.com:2181,xxx-mzk02.sial.com:2181,xxx-mzk03.
> > sial.com:2181/solr",
> > [
> > "b2b-catalog-material-180124T",
> > [
> > "queueSize",
> > 0,
> > "lastTimestamp",
> > "2018-02-28T18:34:39.704Z"
> > ]
> > ],
> > "yyy-mzk01.sial.com:2181,yyy-mzk02.sial.com:2181,yyy-mzk03.
> > sial.com:2181/solr",
> > [
> > "b2b-catalog-material-180124T",
> > [
> > "queueSize",
> > 0,
> > "lastTimestamp",
> > "2018-02-28T18:34:39.704Z"
> > ]
> > ]
> > ],
> > "tlogTotalSize": 1970848,
> > "tlogTotalCount": 1,
> > "updateLogSynchronizer": "stopped"
> > }
> >
> >
> > On Fri, Mar 2, 2018 at 5:05 PM, Webster Homer <webster.homer@sial.com>
> > wrote:
> >
> >> It looks like the data is getting to the target servers. I see tlog
> files
> >> with the right timestamps. Looking at the timestamps on the documents in
> >> the collection none of the data appears to have been loaded.
> >> In the solr.log I see lots of /cdcr messages
> action=LASTPROCESSEDVERSION,
> >>  action=COLLECTIONCHECKPOINT, and  action=SHARDCHECKPOINT
> >>
> >> no errors
> >>
> >> autoCommit is set to  60000 I tried sending a commit explicitly no
> >> difference. cdcr is uploading data, but no new data appears in the
> >> collection.
> >>
> >> On Fri, Mar 2, 2018 at 1:39 PM, Webster Homer <webster.homer@sial.com>
> >> wrote:
> >>
> >>> We have been having strange behavior with CDCR on Solr 7.2.0.
> >>>
> >>> We have a number of replicas which have identical schemas. We found
> that
> >>> TLOG replicas give much more consistent search results.
> >>>
> >>> We created a collection using TLOG replicas in our QA clouds.
> >>> We have a locally hosted solrcloud with 2 nodes, all our collections
> >>> have 2 shards. We use CDCR to replicate the collections from this
> >>> environment to 2 data centers hosted in Google cloud. This seems to
> work
> >>> fairly well for our collections with NRT replicas. However the new TLOG
> >>> collection has problems.
> >>>
> >>> The google cloud solrclusters have 4 nodes each (3 separate
> Zookeepers).
> >>> 2 shards per collection with 2 replicas per shard.
> >>>
> >>> We never see data show up in the cloud collections, but we do see tlog
> >>> files show up on the cloud servers. I can see that all of the servers
> have
> >>> cdcr started, buffers are disabled.
> >>> The cdcr source configuration is:
> >>>
> >>> "requestHandler":{"/cdcr":{
> >>>       "name":"/cdcr",
> >>>       "class":"solr.CdcrRequestHandler",
> >>>       "replica":[
> >>>         {
> >>>           "zkHost":"xxx-mzk01.sial.com:2181,xxx-mzk02.sial.com:2181,xx
> >>> x-mzk03.sial.com:2181/solr",
> >>>           "source":"b2b-catalog-material-180124T",
> >>>           "target":"b2b-catalog-material-180124T"},
> >>>         {
> >>>           "zkHost":"yyyy-mzk01.sial.com:2181,yyyy-mzk02.sial.com:2181,
> >>> yyyy-mzk03.sial.com:2181/solr",
> >>>           "source":"b2b-catalog-material-180124T",
> >>>           "target":"b2b-catalog-material-180124T"}],
> >>>       "replicator":{
> >>>         "threadPoolSize":4,
> >>>         "schedule":500,
> >>>         "batchSize":250},
> >>>       "updateLogSynchronizer":{"schedule":60000}}}}
> >>>
> >>> The target configurations in the 2 clouds are the same:
> >>> "requestHandler":{"/cdcr":{ "name":"/cdcr", "class":
> >>> "solr.CdcrRequestHandler", "buffer":{"defaultState":"disabled"}}}
> >>>
> >>> All of our collections have a timestamp field, index_date. In the
> source
> >>> collection all the records have a date of 2/28/2018 but the target
> >>> collections have a latest date of 1/26/2018
> >>>
> >>> I don't see cdcr errors in the logs, but we use logstash to search
> them,
> >>> and we're still perfecting that.
> >>>
> >>> We have a number of similar collections that behave correctly. This is
> >>> the only collection that is a TLOG collection. It appears that CDCR
> doesn't
> >>> support TLOG collections.
> >>>
> >>> This begins to look like a bug
> >>>
> >>>
> >>
> >
>
> --
>
>
> This message and any attachment are confidential and may be privileged or
> otherwise protected from disclosure. If you are not the intended recipient,
> you must not copy this message or attachment or disclose the contents to
> any other person. If you have received this transmission in error, please
> notify the sender immediately and delete the message and any attachment
> from your system. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not accept liability for any omissions or errors in this
> message which may arise as a result of E-Mail-transmission or for damages
> resulting from any unauthorized changes of the content of this message and
> any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not guarantee that this message is free of viruses and does
> not accept liability for any damages caused by any virus transmitted
> therewith.
>
> Click http://www.emdgroup.com/disclaimer to access the German, French,
> Spanish and Portuguese versions of this disclaimer.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message