accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <josh.el...@gmail.com>
Subject Re: Replication Latency
Date Thu, 12 Jan 2017 01:02:23 GMT
Did you look at the accumulo-gc log to actually correlate how often the 
class I sent is being executed?

Noe Detore wrote:
> To be fare, after writing the post I grepped the logs and found my WALs
> rolling over on size before the time max.age threshold was hit. That is
> the reason I did not see improvement in latency based on adjustment by
> reducing the max.age.
>
> There is still an x factor from when a WAL is no longer written to by
> the tserver as to when it actually gets replicated that I need to figure
> out. For example my WALs appear to done(new wal created on tserver)
> being written to in 3m, but replication is taking about 12 to 15 min to
> complete. Even though the wal is not being written to after 3m I am not
> seeing it ready for replication (closed: true) until after 13m.
>
>
> On Wed, Jan 11, 2017 at 5:44 PM, Josh Elser <josh.elser@gmail.com
> <mailto:josh.elser@gmail.com>> wrote:
>
>     See org.apache.accumulo.gc.replication.CloseWriteAheadLogReferences
>     for where WALs are currently marked as "closed".
>
>     I don't recall the details, but I think there was some issue with
>     trying to close them in TabletServerLogger.
>
>     Yes to your last question: if it were done in TabletServerLogger, it
>     would be closed more quickly than done by the GC. The issue is
>     whether or not it's actually safe to mark them as closed there. I
>     just don't remember the internal WAL lifecycle well enough.
>
>
>     Noe Detore wrote:
>
>         Hello,
>
>         I trying to influence replication latency with
>         tserver.walog.max.age.
>         But noticing no difference when setting the value low. Looking
>         in the
>         code of org.apache.accumulo.tserver.log.TabletServerLogger:
>
>         protected void closeForReplication(Collection<CommitSession>
>         sessions) {
>              // TODO We can close the WAL here for replication purposes
>            }
>
>         This to do is called by :
>         testLockAndRun(logSetLock, new TestCallWithWriteLock() {
>                @Override
>                boolean test() {
>                  return (logSizeEstimate.get() > maxSize) ||
>         ((System.currentTimeMillis() - createTime) > maxAge);
>                }
>
>                @Override
>                void withWriteLock() throws IOException {
>                  close();
>                  closeForReplication(sessions);
>                }
>              });
>              return seq;
>            }
>
>         I am still trying to understand what is happening here, but
>         could this
>         TODO be the reason replication status records are not being
>         updated with
>         'closed: true' sooner ?
>
>         Thank you
>         Noe
>
>

Mime
View raw message