kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Wong <aw...@apache.org>
Subject Re: Unable to load consensus metadata (the metadata file is missed and the part of tablets is unavailable)
Date Tue, 05 Jun 2018 21:45:23 GMT
The root cause of the issue is a bit nuanced and it boils down to the fact
that the consensus metadata doesn't always get fsynced, and a hard shut
down can thus lead to the posted behavior. This comment
<https://issues.apache.org/jira/browse/KUDU-2195?focusedCommentId=16328129&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16328129>
has
all the details on why this is happening. A bandaid solution to this is to
use the `kudu remote_replica copy` tool to copy the remaining healthy
tablet replica (at node105) to the currently failed ones (node104,106).

Todd posted a patch for this, further down in the link; unfortunately it
hasn't landed on the master branch and AFAIK there isn't another fix at the
moment.

On Mon, Jun 4, 2018 at 6:59 PM, chengjunhao@tuandai.com <
chengjunhao@tuandai.com> wrote:

>
> hello!how do you do! i am come from china, i had a problem for the use of
> cloudera kudu!
>
> i had met this issue many times,i dont know what the exact reason for this
> issue. but every time i met this issue is this situation: when some master
> and tserver service started failure with the ntp unsync problem at the
> first start, and i restart the master and tserver when the ntp is sync,but
> i will met this issue! the flowing is the log for the command "kudu cluster
> ksck cluster1:7051,cluster2:7051,cluster3:7051":
>
> Tablet 147962d1afa0419bbda19e849ee210ee of table 'my_first_table' is unavailable: 2 replica(s)
not RUNNING
>   052adf65aa5e465c86318732b3a9fcc2 (node104:7050): bad state
>     State:       FAILED
>     Data state:  TABLET_DATA_READY
>     Last status: Incomplete: Unable to load consensus metadata for tablet 147962d1afa0419bbda19e849ee210ee:
Could not read header for proto container file /var/lib/kudu/tserver/consensus-meta/147962d1afa0419bbda19e849ee210ee:
File size not large enough to be valid: Proto container file /var/lib/kudu/tserver/consensus-meta/147962d1afa0419bbda19e849ee210ee:
Tried to read 16 bytes at offset 0 but file size is only 0 bytes
>   2123398e90bc4373a7429b4caa014dc7 (node106:7050): bad state
>     State:       FAILED
>     Data state:  TABLET_DATA_READY
>     Last status: Incomplete: Unable to load consensus metadata for tablet 147962d1afa0419bbda19e849ee210ee:
Could not read header for proto container file /var/lib/kudu/tserver/consensus-meta/147962d1afa0419bbda19e849ee210ee:
File size not large enough to be valid: Proto container file /var/lib/kudu/tserver/consensus-meta/147962d1afa0419bbda19e849ee210ee:
Tried to read 16 bytes at offset 0 but file size is only 0 bytes
>   d444b36807624acd96264eac11dd99fc (node105:7050): RUNNING [LEADER]
> Table my_first_table has 1 unavailable tablet(s)
> Table Summary
>       Name      |   Status    | Total Tablets | Healthy | Under-replicated | Unavailable
> ----------------+-------------+---------------+---------+------------------+-------------
>  my_first_table | UNAVAILABLE | 16            | 15      | 0                | 1
> ==================
> Errors:
> ==================
> table consistency check error: Corruption: 1 out of 1 table(s) are bad
>
>
> the version i used is "kudu-master-1.5.0+cdh5.13.0+0-1.cdh5.13.0.p0.34.el7.x86_64"
> and "kudu-tserver-1.5.0+cdh5.13.0+0-1.cdh5.13.0.p0.34.el7.x86_64"
>
> can you tell me the reason for the issue and what can i do for this issue
> again ?
> ------------------------------
> chengjunhao@tuandai.com
>

Mime
View raw message