kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Wong <aw...@cloudera.com>
Subject Re: Re: kudu tsserver question!
Date Fri, 07 Sep 2018 01:37:55 GMT
>
>  1.hadoop4.uce.cn is alive,
> I use the tool to check that everything is healthy, but kudu web still sees the following:


Right, you'll notice that the "dead" tserver has a UUID that is different
than the one that is currently at hadoop4.uce.cn. This likely means that
the tablet server died, and an administrator wiped all the data from it,
and started a new one at the same location (which is not uncommon in
practice), a bit over a month ago. This isn't a sign that anything is
currently unhealthy, as evident that `ksck` returned healthy. The master
just keeps a record of the previously known tablet servers. If you restart
the master, that should disappear.


> We use spark to consume kafka data in kudu. When there is a tablet bad in kudu clustrer
check, the writing delay occurs in breaking. When kudu is healthy, it is found that the data
is missing.
>
> Reset the kafka offset of the lost data and consume the kafka data again to write kudu.
It was found that the lost data could be written to kudu
>

If the tablets were "bad" during a time of writing, it isn't necessarily
surprising that the writes didn't succeed. That said, do you recall what
"bad" state the tablets were in? You can probably find more info in the
error logs of one of the tablet servers that hosts the tablet.

On Thu, Sep 6, 2018 at 6:12 PM fengbaoli@uce.cn <fengbaoli@uce.cn> wrote:

>
> Hi:
>
>  1.hadoop4.uce.cn is alive,
> I use the tool to check that everything is healthy, but kudu web still sees the following:
>
> check result:
> Using ts web, the table is still on the “ hadoop4.uce.cn"'s  machine
>
> ------------------------------
>
> 2."missing data"
>
> We use spark to consume kafka data in kudu. When there is a tablet bad in kudu clustrer
check, the writing delay occurs in breaking. When kudu is healthy, it is found that the data
is missing.
>
> Reset the kafka offset of the lost data and consume the kafka data again to write kudu.
It was found that the lost data could be written to kudu
>
>
>
>
>
> [image: 说明: logo1]
> 优速物流有限公司
> 大数据中心     冯宝利
> *Mobil*:15050552430
> *Email*:fengbaoli@uce.cn
>
>
> *发件人:* Andrew Wong <awong@cloudera.com>
> *发送时间:* 2018-09-07 01:32
> *收件人:* user <user@kudu.apache.org>
> *主题:* Re: kudu tsserver question!
> Hi,
>
> To clarify, do you mean that hadoop4.uce.cn is actually still alive and
> the posted logs are from that machine? Also, do you have other indicators
> of "missing data"? What does the `kudu ksck` tool
> <https://kudu.apache.org/docs/command_line_tools_reference.html#cluster-ksck>
>  report?
>
> Based on that screenshot, it seems like hadoop4.uce.cn has gone down.
> After 5 minutes, Kudu *should* automatically replicate any data that was
> on that server. If this is the case, ksck should confirm this by showing a
> healthy cluster.
>
> Andrew
>
> On Wed, Sep 5, 2018 at 10:53 PM fengbaoli@uce.cn <fengbaoli@uce.cn> wrote:
>
>> hi:
>>    We are using kudu( version 1.6.0)
>> and have found some data missing problems:
>>
>> the  kudu tsserver log is:
>>
>> E0904 01:18:35.213666 80406 consensus_queue.cc:618] T 08499489f43b46f1816115faafaff092
P 739b33f810844ebcaef489e0b83c3eba [LEADER]: Error trying to read ahead of the log while preparing
peer request: Incomplete: Op with index 3231 is ahead of the local log (next sequential op:
3231). Destination peer: Peer: 37cc8fb75e984d3c90564f0659a8170f, Status: INVALID_TERM, Last
received: 394.3231, Next index: 3232, Last known committed idx: 3231, Time since last communication:
0.000s
>>
>> E0904 01:18:35.221812 80421 consensus_queue.cc:618] T 22d1ee4c979047e894d96f3ebfb016ba
P 739b33f810844ebcaef489e0b83c3eba [LEADER]: Error trying to read ahead of the log while preparing
peer request: Incomplete: Op with index 762 is ahead of the local log (next sequential op:
762). Destination peer: Peer: 8907a006b28a4d52afcc66ff48e11faf, Status: INVALID_TERM, Last
received: 305.761, Next index: 763, Last known committed idx: 762, Time since last communication:
0.008s
>>
>> E0904 01:18:35.265733 80506 consensus_queue.cc:618] T 58514deea163421c897bfd0984d7e1fe
P 739b33f810844ebcaef489e0b83c3eba [LEADER]: Error trying to read ahead of the log while preparing
peer request: Incomplete: Op with index 1491 is ahead of the local log (next sequential op:
1491). Destination peer: Peer: 8907a006b28a4d52afcc66ff48e11faf, Status: INVALID_TERM, Last
received: 1586.1491, Next index: 1492, Last known committed idx: 1491, Time since last communication:
0.010s
>>
>> E0904 01:18:35.336474 80630 consensus_queue.cc:618] T 5de63166a80f4274b038d61741046946
P 739b33f810844ebcaef489e0b83c3eba [LEADER]: Error trying to read ahead of the log while preparing
peer request: Incomplete: Op with index 54 is ahead of the local log (next sequential op:
54). Destination peer: Peer: 92bb2234165540c0bdf473efd5620b12, Status: INVALID_TERM, Last
received: 49.54, Next index: 55, Last known committed idx: 54, Time since last communication:
0.007s
>>
>> E0904 01:18:35.337077 80639 consensus_queue.cc:618] T 7ed43fb6e9d049ada81863c8307af2ee
P 739b33f810844ebcaef489e0b83c3eba [NON_LEADER]: Error trying to read ahead of the log while
preparing peer request: Incomplete: Op with index 35957 is ahead of the local log (next sequential
op: 35957). Destination peer: Peer: 162f275784fa4fbfa49ad8a2639f87c4, Status: INVALID_TERM,
Last received: 403.35956, Next index: 35958, Last known committed idx: 35957, Time since last
communication: 0.008s
>>
>> E0904 01:18:35.531167 80957 consensus_queue.cc:618] T 02a0a1f4e5784f8593e1a27df95a76ee
P 739b33f810844ebcaef489e0b83c3eba [LEADER]: Error trying to read ahead of the log while preparing
peer request: Incomplete: Op with index 8635 is ahead of the local log (next sequential op:
8635). Destination peer: Peer: 162f275784fa4fbfa49ad8a2639f87c4, Status: INVALID_TERM, Last
received: 340.8635, Next index: 8636, Last known committed idx: 8635, Time since last communication:
0.008s
>>
>> E0904 02:41:05.573040 170286 consensus_queue.cc:618] T 458fcae73801431fb1ff0a25756f02c8
P 739b33f810844ebcaef489e0b83c3eba [LEADER]: Error trying to read ahead of the log while preparing
peer request: Incomplete: Op with index 235820 is ahead of the local log (next sequential
op: 235820). Destination peer: Peer: 8907a006b28a4d52afcc66ff48e11faf, Status: INVALID_TERM,
Last received: 407.235820, Next index: 235821, Last known committed idx: 235819, Time since
last communication: 0.009s
>>
>> E0905 05:22:49.712536  1863 consensus_queue.cc:618] T a8041956da6c4ac7a0ab56bac24054d5
P 739b33f810844ebcaef489e0b83c3eba [LEADER]: Error trying to read ahead of the log while preparing
peer request: Incomplete: Op with index 85 is ahead of the local log (next sequential op:
85). Destination peer: Peer: 37cc8fb75e984d3c90564f0659a8170f, Status: INVALID_TERM, Last
received: 72.85, Next index: 86, Last known committed idx: 85, Time since last communication:
0.068s
>>
>> E0905 08:28:23.729621 79559 consensus_queue.cc:618] T c2723c673a1543b9bc1f5f36156b7506
P 739b33f810844ebcaef489e0b83c3eba [LEADER]: Error trying to read ahead of the log while preparing
peer request: Incomplete: Op with index 55 is ahead of the local log (next sequential op:
55). Destination peer: Peer: 28d889cd9742406c9d2c7ff550114080, Status: INVALID_TERM, Last
received: 48.55, Next index: 56, Last known committed idx: 55, Time since last communication:
0.117s
>>
>> E0905 08:28:23.913836 76929 consensus_queue.cc:618] T 0172f42500b74ff68d3f8bdb629efc2c
P 739b33f810844ebcaef489e0b83c3eba [LEADER]: Error trying to read ahead of the log while preparing
peer request: Incomplete: Op with index 51 is ahead of the local log (next sequential op:
51). Destination peer: Peer: 28d889cd9742406c9d2c7ff550114080, Status: INVALID_TERM, Last
received: 48.51, Next index: 52, Last known committed idx: 51, Time since last communication:
0.393s
>>
>>
>> It is also found that the data inserted into kudu during the process is missing.
>>
>> please help me,thanks!
>>
>> ------------------------------
>> [image: 说明: logo1]
>> 优速物流有限公司
>> 大数据中心     冯宝利
>> *Mobil*:15050552430
>> *Email*:fengbaoli@uce.cn
>>
>
>
> --
> Andrew Wong
>
>

-- 
Andrew Wong

Mime
View raw message