asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chen Luo <>
Subject Re: Question about LSM Disk Component LSN
Date Sat, 20 May 2017 06:46:23 GMT
Hi Akshay,

Thanks a lot for the information. What I want to do is to set a monotonic
increasing ID for each flushed disk component which can be used in the
correlated merge policy. Since we already set the LSN (actually it should
be FLUSH_LSN as you described) for the flushed disk component, currently I
simply take the LSN as the component ID.

Based on your description, I think the FLUSH_LSN should be monotonic
increasing, since:
1. Flush operations are always executed serially for an index (this is
guaranteed by the scheduler)
2. Since operation LSNs are monotonic increasing, if d1 is flushed later
than d2, then we should have d1.FLUSH_LSN < d2.FLUSH_LSN.

However, based on my testing, the above property does not always hold. Then
it seems there is a bug when setting the FLUSH_LSN for disk components?

Best regards,
Chen Luo

On Fri, May 19, 2017 at 11:12 PM, Akshay Manchale Sridhar <>

> Hi Chen,
> The LSN only needs to indicate the last operation that was performed on the
> flushed disk component so we know where to begin recovery for that index.
> The flush operation is triggered only after the flush log record hits the
> disk and while we wait for that to happen, the state of the mutable index
> component will be READABLE_UNWRITABLE - this prevents any new writes going
> into the mutable component whose LSN will be greater than the FLUSH_LSN
> (the LSN when it was requested, not completed).
> When the flush log record is persisted to disk, we can start the flush
> operation for the component but now we also switch to its shadow buffer and
> make that the current mutable component so that the ongoing flush operation
> does not block incoming writes.
> All new incoming writes with LSN > FLUSH_LSN will go into the second buffer
> and not the one that is flushing. When the flush operation is complete, we
> write the LSN corresponding to when the flush log was created (I'm not sure
> how this information is pushed down to LSMBTreeIOOperationCallback).
> Regardless of when a set of index flushes hit the disk, the corresponding
> LSN in the disk component will be the last operation that modified it and
> should not be when it was completed. The time between FLUSH_LSN and the
> current LSN when the component hits the disk, no operations are performed
> on the flushed component. If we set the LSN to the time when the component
> is flushed to disk, lets call this FLUSH_COMPLETE_LSN, we will miss looking
> at transaction log records generated between FLUSH_LSN and
> FLUSH_COMPLETE_LSN during recovery.
> So even though a set of flush operations complete writing to disk in an
> order different from the order in which the flush operation was requested,
> the timeline for what went into the disk components with respect to the
> transaction log will be consistent and we will not risk losing data between
> FLUSH_LSN and FLUSH_COMPLETE_LSN in case of failures by starting to scan at
> FLUSH_COMPLETE_LSN instead of FLUSH_LSN. I don't know if there are any
> other uses for the LSN in the disk component other than finding out when
> the recovery needs to start that would break this assumption. Hope that
> helps!
> On Fri, May 19, 2017 at 10:29 PM, Chen Luo <> wrote:
> > Hi Devs,
> >
> > Recently I was using LSN to set a component ID for every newly flushed
> disk
> > component. From LSMBTreeIOOperationCallback (as well as other index
> > callbacks), I saw that after every flush operation, the LSN was placed at
> > the newly flushed disk component metadata. I was expecting that the LSN
> > should be increasing for every newly flushed disk component. That is, if
> a
> > disk component d1 is flushed later than another disk component d2, we
> > should have d1.LSN>d2.LSN. (please correct me if I'm wrong)
> >
> > However, based on my testing, this condition does not always hold. It is
> > possible that a later flushed disk component has a smaller LSN than the
> > previous flushed disk components (I found this by recording the previous
> > LSN, and throwing an exception when it is larger than the current LSN).
> Is
> > this behavior expected? Or we do not have the guarantee that LSNs placed
> at
> > flushed disk components are monotonic increasing?
> >
> > Any help is appreciated.
> >
> > Best regards,
> > Chen Luo
> >

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message