asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Carey <dtab...@gmail.com>
Subject Re: About the system behavior when the checkpoint is corrupted
Date Wed, 29 Nov 2017 22:23:28 GMT
+1 for not proceeding and simply removing the data in this (ideally 
unreachable) state....


On 11/29/17 2:15 PM, Ian Maxon wrote:
> I too have seen this issue, but I couldn't reproduce or surmise how it
> might happen from just inspecting the code. How'd it appear for you?
> I would disagree that a checkpoint file not appearing is a small thing
> however. It is more or less the most important artifact for recovery.
> It's not something that ever should have an issue like this.
>
> On Wed, Nov 29, 2017 at 1:54 PM, Chen Luo <cluo8@uci.edu> wrote:
>> Hi devs,
>>
>> Recently I was experiencing a very annoying issue about recovery. The
>> checkpoint file of my dataset was somehow corrupted (and I didn't know
>> why). However, when I was restarting AsterixDB, it fails to read the
>> checkpoint file, and starts recovering as a clean state. This is highly
>> undesirable in the sense that it clean up all of my experiment datasets
>> saliently, roughly 100GB. And it'll take me days to re-ingest these data to
>> resume my experiments.
>>
>> I think the behavior of cleaning up all data when some small thing goes
>> wrong is undesirable and dangerous. When AsterixDB fails to restart, and
>> finds the data directory non-empty, I think it should notify the user and
>> let the user to make the decision. For example, it could fail to restart at
>> this time, and user could clean up the directory manually, or try to use a
>> backup checkpoint file, or add some flag to force restart. Anyway, blindly
>> cleaning up all files seem to be a dangerous solution.
>>
>> Any thoughts on this?
>>
>> Best regards,
>> Chen Luo


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message