kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Darren Hoo <darren....@gmail.com>
Subject Re: where is kudu's dump core located?
Date Wed, 06 Apr 2016 07:53:32 GMT
Todd,

Thanks a lot for such a quick response and  fix!

I have some trouble with setting up the build environment for now and don't
have  time to look throught the documentation.

so can you send me the binary or where can I download it?  I'd very much
appreciate that.


On Wed, Apr 6, 2016 at 3:18 PM, Todd Lipcon <todd@cloudera.com> wrote:

> I also put up a patch which should fix the issue here:
> http://gerrit.cloudera.org:8080/#/c/2725/
> If you're able to rebuild from source, give it a try. It should apply
> cleanly on top of 0.7.1.
>
> If not, let me know and I can send you a binary to test out.
>
> -Todd
>
> On Tue, Apr 5, 2016 at 11:21 PM, Todd Lipcon <todd@cloudera.com> wrote:
>
>> BTW, I filed https://issues.apache.org/jira/browse/KUDU-1396 for this
>> bug. Thanks for helping us track it down!
>>
>> On Tue, Apr 5, 2016 at 11:05 PM, Todd Lipcon <todd@cloudera.com> wrote:
>>
>>> Hi Darren,
>>>
>>> Thanks again for the core. I got a chance to look at it, and it looks to
>>> me like you have a value which is 58KB large which is causing the issue
>>> here. In particular, what seems to have happened is that there is an UPDATE
>>> delta which is 58KB, and we have a bug in our handling of index blocks when
>>> a single record is larger than 32KB. The bug causes an infinite recursion
>>> which blows out the stack and crashes with the scenario you saw (if you
>>> print out the backtrace all the way to stack frame #81872 you can see the
>>> original call to AppendDelta which starts the recursion).
>>>
>>> Amusingly, there is this debug-level assertion in the code:
>>>
>>>  size_t est_size = idx_block->EstimateEncodedSize();
>>>   if (est_size > options_->index_block_size) {
>>>     DCHECK(idx_block->Count() > 1)
>>>       << "Index block full with only one entry - this would create "
>>>       << "an infinite loop";
>>>     // This index block is full, flush it.
>>>     BlockPointer index_block_ptr;
>>>     RETURN_NOT_OK(FinishBlockAndPropagate(level));
>>>   }
>>>
>>> which I wrote way back in October 2012 about 3 weeks into Kudu's initial
>>> development. Unfortunately it looks like we never went back to actually
>>> address the problem, and in release builds, it causes a crash (rather than
>>> an assertion failure in debug builds).
>>>
>>> I believe given this information we can easily reproduce and fix the
>>> issue. Unfortunately it's probably too late for the 0.8.0 release, which is
>>> already being voted upon. Do you think you would be able to build from
>>> source? If not, we can probably provide you with a patched binary off of
>>> trunk at some point if you want to help us verify the fix rather than wait
>>> a couple months until the next release.
>>>
>>> -Todd
>>>
>>>
>>>
>>> On Tue, Apr 5, 2016 at 6:33 PM, Todd Lipcon <todd@cloudera.com> wrote:
>>>
>>>> On Tue, Apr 5, 2016 at 6:27 PM, Darren Hoo <darren.hoo@gmail.com>
>>>> wrote:
>>>>
>>>>> Thanks Todd,
>>>>>
>>>>> let me try giving a little more details here.
>>>>>
>>>>> When I first created the table and loaded about 100k records, kudu
>>>>> tablet  server started to crash and very often.
>>>>>
>>>>> So I suspect that maybe the data file is corrupted and I dump the
>>>>> table as parquet file ,
>>>>> drop the table, recreate the table, and import the parquet file again.
>>>>>
>>>>> But after I did that, the tablet server still crashes often utill I
>>>>> increase the memory limit to 16GB,
>>>>> then the tablet server crashes less often, one time for serveral days.
>>>>>
>>>>> There's one big STRING column in my table, but the column should not
>>>>> be bigger than 4k in size
>>>>> as kudu document recommends.
>>>>>
>>>>
>>>> OK, that's definitely an interesting part of the story. Although we
>>>> think that 4k strings should be OK, the testing in this kind of workload
>>>> has not been as extensive.
>>>>
>>>> If you are able to share the Parquet file and "create table" command
>>>> for the dataset off-list, that would be great. I'll keep it only within our
>>>> datacenter and delete it when done debugging.
>>>>
>>>>
>>>>>
>>>>> I will try to create a minmal dataset to reproduce the issue, but I am
>>>>> not sure I can create one.
>>>>>
>>>>
>>>> Thanks, that would be great if the larger dataset can't be shared.
>>>>
>>>>
>>>>>
>>>>> here's the core dump compressed,
>>>>>
>>>>> http://188.166.175.200/core.90197.bz2
>>>>>
>>>>> the exact kudu version is : 0.7.1-1.kudu0.7.1.p0.36   (installed from
>>>>> parcel)
>>>>>
>>>>>
>>>> OK, thank you. I"m downloading it now and will take a look tonight or
>>>> tomorrow.
>>>>
>>>> -Todd
>>>>
>>>>
>>>>> On Wed, Apr 6, 2016 at 8:59 AM, Todd Lipcon <todd@cloudera.com>
wrote:
>>>>>
>>>>>> Hi Darren,
>>>>>>
>>>>>> This is interesting. I haven't seen a crash that looks like this,
and
>>>>>> not sure why it would cause data to disappear either.
>>>>>>
>>>>>> By any chance do you have some workload that can reproduce the issue?
>>>>>> e.g. a particular data set that you are loading that seems to be
causing
>>>>>> problems?
>>>>>>
>>>>>> Maybe you can gzip the core file and send it to me off-list if it
>>>>>> isn't too large?
>>>>>>
>>>>>>
>>>>>>
>>>>
>>>>
>>>> --
>>>> Todd Lipcon
>>>> Software Engineer, Cloudera
>>>>
>>>
>>>
>>>
>>> --
>>> Todd Lipcon
>>> Software Engineer, Cloudera
>>>
>>
>>
>>
>> --
>> Todd Lipcon
>> Software Engineer, Cloudera
>>
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>

Mime
View raw message