kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nick Wolf <nickwo...@gmail.com>
Subject Re: [KUDU Tablet]unrecoverable crash
Date Sat, 20 Feb 2016 01:05:04 GMT
I've identified the tablet ID and tried to delete and start the server but
it seems like chain reaction. They keep coming one by one with different
tablet ids.
rm tablet-meta/1c2475126c7c4cc2b82f95bd6af5bdb4
rm wals/1c2475126c7c4cc2b82f95bd6af5bdb4
rm consensus-meta/1c2475126c7c4cc2b82f95bd6af5bdb4

A notable point here is none of the tablet ids that it shows bootstrapping
are not appearing in web interface. (http://host:8051/tables)

On Fri, Feb 19, 2016 at 12:49 PM, Todd Lipcon <todd@cloudera.com> wrote:

> Hi Nick,
>
> Are you able to determine the tablet ID that is failing to restart?
> The log line indicates that it's thread ID 6285. If you look farther
> up the log with 'grep " 6285 " kudu-tserver.INFO', you should see a
> log message indicating that that thread is starting to bootstrap a
> particular tablet.
>
> Is this a replicated table, or num_replicas=1? If it's replicated, we
> can probably recover by removing the corrupt replica and letting it
> grab a new copy from one of the other replicas. Otherwise, we'll have
> to do some more serious "surgery" which we can assist you with.
>
> Either way, see if you can figure out the bad tablet ID. Then, if it's
> possible to send a copy of the WAL directory for this tablet to me off
> list, I can try to do some post-mortem analysis to see what went
> wrong.
>
> Thanks
> -Todd
>
> On Fri, Feb 19, 2016 at 12:37 PM, Nick Wolf <nickwolf7@gmail.com> wrote:
> > KUDU Tablet crashed with following fatal error.
> >
> > F0219 12:15:11.389806  6285 mvcc.cc:542] Check failed: _s.ok() Bad
> status:
> > Illegal state: Timestamp: 5963266013874102274 is already committed.
> Current
> > Snapshot: MvccSnapshot[committed={T|T < 5963266013874118554 or (T in
> > {5963266013874118554})}]
> >
> > It throws the same fatal error and crashes immediately no matter how many
> > times i try to restart the service.
> >
> > Any ideas to get out of this situation? I don't want to lose the data.
> >
> >
> > --Nick
> >
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>

Mime
View raw message