lucene-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson (Jira)" <>
Subject [jira] [Updated] (SOLR-13913) CDCR should limit TLOG growth
Date Sat, 09 Nov 2019 15:35:00 GMT


Erick Erickson updated SOLR-13913:
    Component/s: CDCR

> CDCR should limit TLOG growth
> -----------------------------
>                 Key: SOLR-13913
>                 URL:
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: CDCR
>            Reporter: Erick Erickson
>            Priority: Major
> CDCR uses TLOGs for a queueing mechanism. If the connection between DCs goes down for
any reason and is not caught, the tlogs will grow forever, which can lead to disk full situations
and all that entails.
> Aside from that problem, it's not clear that reprocessing a zillion updates is faster
than a full replication anyway.
> Since the full-index replication was added, we can avoid runaway tlogs by somehow noticing
we haven't been connected to the remote DC for a long time, purge the tlogs (keeping just
enough for peer sync of course) and do a full index replication next time we do connect.
> This is pretty vague, I don't have a good idea of whether tlog size is the right metric,
or some sort of time since last successful transmission, or the queue size or some combination
of these and others. The point is simply that after some threshold was crossed, reset to a
zero state and avoid the pitfalls of continuing to accumulate updates.
> I'd suggest these be tunable parameters defined in solrconfig.xml since I can imagine
that  terabyte-scale indexes should fall back to full-index replication more rarely than
megabyte-scale indexes.
> This idea came up in discussions and I wanted to preserve the it in case someone wants
to pursue it.

This message was sent by Atlassian Jira

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message