kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: where is kudu's dump core located?
Date Wed, 06 Apr 2016 06:05:17 GMT
Hi Darren,

Thanks again for the core. I got a chance to look at it, and it looks to me
like you have a value which is 58KB large which is causing the issue here.
In particular, what seems to have happened is that there is an UPDATE delta
which is 58KB, and we have a bug in our handling of index blocks when a
single record is larger than 32KB. The bug causes an infinite recursion
which blows out the stack and crashes with the scenario you saw (if you
print out the backtrace all the way to stack frame #81872 you can see the
original call to AppendDelta which starts the recursion).

Amusingly, there is this debug-level assertion in the code:

 size_t est_size = idx_block->EstimateEncodedSize();
  if (est_size > options_->index_block_size) {
    DCHECK(idx_block->Count() > 1)
      << "Index block full with only one entry - this would create "
      << "an infinite loop";
    // This index block is full, flush it.
    BlockPointer index_block_ptr;
    RETURN_NOT_OK(FinishBlockAndPropagate(level));
  }

which I wrote way back in October 2012 about 3 weeks into Kudu's initial
development. Unfortunately it looks like we never went back to actually
address the problem, and in release builds, it causes a crash (rather than
an assertion failure in debug builds).

I believe given this information we can easily reproduce and fix the issue.
Unfortunately it's probably too late for the 0.8.0 release, which is
already being voted upon. Do you think you would be able to build from
source? If not, we can probably provide you with a patched binary off of
trunk at some point if you want to help us verify the fix rather than wait
a couple months until the next release.

-Todd



On Tue, Apr 5, 2016 at 6:33 PM, Todd Lipcon <todd@cloudera.com> wrote:

> On Tue, Apr 5, 2016 at 6:27 PM, Darren Hoo <darren.hoo@gmail.com> wrote:
>
>> Thanks Todd,
>>
>> let me try giving a little more details here.
>>
>> When I first created the table and loaded about 100k records, kudu tablet
>>  server started to crash and very often.
>>
>> So I suspect that maybe the data file is corrupted and I dump the table
>> as parquet file ,
>> drop the table, recreate the table, and import the parquet file again.
>>
>> But after I did that, the tablet server still crashes often utill I
>> increase the memory limit to 16GB,
>> then the tablet server crashes less often, one time for serveral days.
>>
>> There's one big STRING column in my table, but the column should not be
>> bigger than 4k in size
>> as kudu document recommends.
>>
>
> OK, that's definitely an interesting part of the story. Although we think
> that 4k strings should be OK, the testing in this kind of workload has not
> been as extensive.
>
> If you are able to share the Parquet file and "create table" command for
> the dataset off-list, that would be great. I'll keep it only within our
> datacenter and delete it when done debugging.
>
>
>>
>> I will try to create a minmal dataset to reproduce the issue, but I am
>> not sure I can create one.
>>
>
> Thanks, that would be great if the larger dataset can't be shared.
>
>
>>
>> here's the core dump compressed,
>>
>> http://188.166.175.200/core.90197.bz2
>>
>> the exact kudu version is : 0.7.1-1.kudu0.7.1.p0.36   (installed from
>> parcel)
>>
>>
> OK, thank you. I"m downloading it now and will take a look tonight or
> tomorrow.
>
> -Todd
>
>
>> On Wed, Apr 6, 2016 at 8:59 AM, Todd Lipcon <todd@cloudera.com> wrote:
>>
>>> Hi Darren,
>>>
>>> This is interesting. I haven't seen a crash that looks like this, and
>>> not sure why it would cause data to disappear either.
>>>
>>> By any chance do you have some workload that can reproduce the issue?
>>> e.g. a particular data set that you are loading that seems to be causing
>>> problems?
>>>
>>> Maybe you can gzip the core file and send it to me off-list if it isn't
>>> too large?
>>>
>>>
>>>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Mime
View raw message