kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Percy <mpe...@cloudera.com>
Subject Re: Tablet instability Issues
Date Wed, 06 Jan 2016 04:35:05 GMT
- kudu-user@googlegroups.com (moved to BCC)
+ user@kudu.incubator.apache.org

Hi Nick,
Thanks for your patience. I was able to reproduce your issue and I have a
fix. I've posted a patch on http://gerrit.cloudera.org:8080/#/c/1715/ so
feel free to try it out if you're able to build from master. If the patch
doesn't apply cleanly for you, either apply the patches in CR 1713 and 1714
first or just ignore the unit test.

I also filed https://issues.cloudera.org/browse/KUDU-1288 to track the bug.

Mike



--
Mike Percy
Software Engineer, Cloudera

On Tue, Jan 5, 2016 at 12:26 PM, Nick Wolf <nickwolf7@gmail.com> wrote:

> Thanks Mike. I'll wait to hear from you.
>
> On Monday, 4 January 2016 12:15:53 UTC-8, mpercy wrote:
>>
>> Nick / Todd,
>> Sorry for taking so long to respond on this thread. I've been pretty busy
>> and away due to the holidays. I'll investigate this and get back to you in
>> the next day or two what I find (and probably a fix, if I can reproduce it).
>>
>> Mike
>>
>>
>> --
>> Mike Percy
>> Software Engineer, Cloudera
>>
>> On Mon, Jan 4, 2016 at 12:06 PM, Todd Lipcon <to...@cloudera.com> wrote:
>>
>>> Hey Nick,
>>>
>>> Sorry for the slowness here -- lots of folks have been on vacation for
>>> the holidays. Mike -- I think you know this code best. Any idea as to where
>>> these extra open WAL segments are coming from on deleted tablets?
>>>
>>> -Todd
>>>
>>> On Mon, Dec 28, 2015 at 11:24 AM, Nick Wolf <nick...@gmail.com> wrote:
>>>
>>>> Hi Todd,
>>>>
>>>> Any findings on why it could happen?
>>>>
>>>> On Monday, 21 December 2015 12:57:57 UTC-8, Todd Lipcon wrote:
>>>>>
>>>>> On Mon, Dec 21, 2015 at 12:45 PM, Nick Wolf <nick...@gmail.com>
wrote:
>>>>>
>>>>>> Hi Todd,
>>>>>>
>>>>>> /tablets shows 5 tablets on web interface. Are you referring to
>>>>>> /tables by any chance?
>>>>>>
>>>>>
>>>>> Interesting. I wonder if we have a bug where deleted tablets are
>>>>> ending up holding on to file descriptors. Mike Percy's the expert on
this
>>>>> area. Any idea, Mike, if something like this might happen? I'm wondering
if
>>>>> we somehow open the Log for tablet peers that are actually tombstoned
or
>>>>> deleted.
>>>>>
>>>>> -Todd
>>>>>
>>>>>
>>>>>>
>>>>>> Regarding the bootstrap error i got it resolved by restarting ntp
>>>>>> services on all nodes of the cluster. I believe i was getting this
error
>>>>>> because i have rebooted some of my cluster nodes which restarted
the ntp
>>>>>> service on these nodes.
>>>>>>
>>>>>> On Thursday, 17 December 2015 18:06:33 UTC-8, Todd Lipcon wrote:
>>>>>>>
>>>>>>> Thanks for the logs. Looks like there are 318 tablets on your
>>>>>>> server, which seems like more than you expected. The /tablets
page on the
>>>>>>> web interface might be interesting. Do you see 318 tablets there,
too?
>>>>>>> Increasing ulimit does seem like a reasonable solution for now.
>>>>>>>
>>>>>>> Regarding the bootstrap error -- it's very strange. It indicates
>>>>>>> that there was some operation applied to the tablet with a timestamp
in the
>>>>>>> future. You're getting this every time you restart the tablet
server? We
>>>>>>> may have to dig a bit to understand why this one's happening.
>>>>>>>
>>>>>>> On Thu, Dec 17, 2015 at 4:14 PM, Nick Wolf <nick...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Todd,
>>>>>>>>
>>>>>>>> Another issue popped up after increasing the ulimit.
>>>>>>>>
>>>>>>>> tablet_bootstrap.cc:771] Check failed: _s.ok() Bad status:
Invalid
>>>>>>>> argument: Tried to update clock beyond the max. error.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thursday, 17 December 2015 10:06:14 UTC-8, Todd Lipcon
wrote:
>>>>>>>>>
>>>>>>>>> Sure, that would be great.
>>>>>>>>>
>>>>>>>>> -Todd
>>>>>>>>>
>>>>>>>>> On Thu, Dec 17, 2015 at 9:56 AM, Nick Wolf <nick...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Todd,
>>>>>>>>>>
>>>>>>>>>> I have one running tablet server out of 6 nodes.
I did lsof on
>>>>>>>>>> this and it shows 1189 files opened at the momemt.
Please let me know if
>>>>>>>>>> you need it as an attachment?
>>>>>>>>>>
>>>>>>>>>> On Thursday, 17 December 2015 09:28:15 UTC-8, Todd
Lipcon wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hi Nick,
>>>>>>>>>>>
>>>>>>>>>>> Sorry, how many _tablets_ per tablet server?
eg when you create
>>>>>>>>>>> tables, how many buckets (for hash-partitioned)
or splits (for
>>>>>>>>>>> range-partitioned) tables did you create?
>>>>>>>>>>>
>>>>>>>>>>> Getting the 'lsof' output of a running TS before
it crashed
>>>>>>>>>>> would also be useful.
>>>>>>>>>>>
>>>>>>>>>>> -Todd
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Dec 17, 2015 at 9:26 AM, Nick Wolf <nick...@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> I am using block_manager=log and i have one
tablet server per
>>>>>>>>>>>> host. I am running a 6 node cluster.
>>>>>>>>>>>> Look like kudu user is adopting the root
account settings in
>>>>>>>>>>>> case of ulimits. I just did "cat /proc/17913/limits"
where 17913 is kudu
>>>>>>>>>>>> process id.
>>>>>>>>>>>> Max cpu time              unlimited     
      unlimited
>>>>>>>>>>>>      seconds
>>>>>>>>>>>> Max file size             unlimited     
      unlimited
>>>>>>>>>>>>      bytes
>>>>>>>>>>>> Max data size             unlimited     
      unlimited
>>>>>>>>>>>>      bytes
>>>>>>>>>>>> Max stack size            8388608       
      unlimited
>>>>>>>>>>>>      bytes
>>>>>>>>>>>> Max core file size        0             
      unlimited
>>>>>>>>>>>>      bytes
>>>>>>>>>>>> Max resident set          unlimited     
      unlimited
>>>>>>>>>>>>      bytes
>>>>>>>>>>>> Max processes             1547355       
      1547355
>>>>>>>>>>>>      processes
>>>>>>>>>>>> Max open files            1024          
      4096
>>>>>>>>>>>>     files
>>>>>>>>>>>> Max locked memory         65536         
      65536
>>>>>>>>>>>>      bytes
>>>>>>>>>>>> Max address space         unlimited     
      unlimited
>>>>>>>>>>>>      bytes
>>>>>>>>>>>> Max file locks            unlimited     
      unlimited
>>>>>>>>>>>>      locks
>>>>>>>>>>>> Max pending signals       1547355       
      1547355
>>>>>>>>>>>>      signals
>>>>>>>>>>>> Max msgqueue size         819200        
      819200
>>>>>>>>>>>>     bytes
>>>>>>>>>>>> Max nice priority         0             
      0
>>>>>>>>>>>> Max realtime priority     0             
      0
>>>>>>>>>>>> Max realtime timeout      unlimited     
      unlimited
>>>>>>>>>>>>      us
>>>>>>>>>>>>
>>>>>>>>>>>> Is there a reason why kudu does not maintain
its own limits
>>>>>>>>>>>> config file? Impala and hdfs maintains them
in
>>>>>>>>>>>> /etc/security/limits.d/impala.conf
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Thursday, 17 December 2015 09:06:46 UTC-8,
Todd Lipcon wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hey Nick,
>>>>>>>>>>>>>
>>>>>>>>>>>>> How about JD's question? Are you using
the file block manager
>>>>>>>>>>>>> workaround? How many tablets per host?
>>>>>>>>>>>>>
>>>>>>>>>>>>> -Todd
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Thu, Dec 17, 2015 at 8:38 AM, Nick
Wolf <nick...@gmail.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> These are my settings for root account
and there is nothing
>>>>>>>>>>>>>> set for user "KUDU".
>>>>>>>>>>>>>> core file size          (blocks,
-c) 0
>>>>>>>>>>>>>> data seg size           (kbytes,
-d) unlimited
>>>>>>>>>>>>>> scheduling priority             (-e)
0
>>>>>>>>>>>>>> file size               (blocks,
-f) unlimited
>>>>>>>>>>>>>> pending signals                 (-i)
1547361
>>>>>>>>>>>>>> max locked memory       (kbytes,
-l) 64
>>>>>>>>>>>>>> max memory size         (kbytes,
-m) unlimited
>>>>>>>>>>>>>> open files                      (-n)
1024
>>>>>>>>>>>>>> pipe size            (512 bytes,
-p) 8
>>>>>>>>>>>>>> POSIX message queues     (bytes,
-q) 819200
>>>>>>>>>>>>>> real-time priority              (-r)
0
>>>>>>>>>>>>>> stack size              (kbytes,
-s) 8192
>>>>>>>>>>>>>> cpu time               (seconds,
-t) unlimited
>>>>>>>>>>>>>> max user processes              (-u)
1547361
>>>>>>>>>>>>>> virtual memory          (kbytes,
-v) unlimited
>>>>>>>>>>>>>> file locks                      (-x)
unlimited
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Does this information help?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Thursday, 17 December 2015 08:20:49
UTC-8, Roland Teague
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Nick, sounds like your max open
files has been exceeded at
>>>>>>>>>>>>>>> the OS level. What does "ulimit
-a" return for open files?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -roland
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> *Roland Teague*
>>>>>>>>>>>>>>> Customer Operations Engineer
- Backline
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Thu, Dec 17, 2015 at 11:08
AM, Nick Wolf <
>>>>>>>>>>>>>>> nick...@gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Kudu tablet server keeps
crashing after running for a while
>>>>>>>>>>>>>>>> with following error.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> F1217 07:08:28.565932 15732
raft_consensus_state.cc:258]
>>>>>>>>>>>>>>>> Check failed: _s.ok() Bad
status: IO error: Unable to write consensus meta
>>>>>>>>>>>>>>>> file for tablet fd20f978f7c644d98115a48fa2b4528f
to path
>>>>>>>>>>>>>>>> /media/drive/kudu/tserver/consensus-meta/fd20f978f7c644d98115a48fa2b4528f:
>>>>>>>>>>>>>>>> Call to mkstemp() failed
on name template
>>>>>>>>>>>>>>>> /media/drive/kudu/tserver/consensus-meta/fd20f978f7c644d98115a48fa2b4528f.tmp.XXXXXX:
>>>>>>>>>>>>>>>> Too many open files (error
24)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> You received this message
because you are subscribed to the
>>>>>>>>>>>>>>>> Google Groups "kudu-user"
group.
>>>>>>>>>>>>>>>> To unsubscribe from this
group and stop receiving emails
>>>>>>>>>>>>>>>> from it, send an email to
kudu-user+...@googlegroups.com.
>>>>>>>>>>>>>>>> To post to this group, send
email to
>>>>>>>>>>>>>>>> kudu...@googlegroups.com.
>>>>>>>>>>>>>>>> To view this discussion on
the web visit
>>>>>>>>>>>>>>>> https://groups.google.com/d/msgid/kudu-user/b11f4053-5598-4d21-a027-2e1165bcd385%40googlegroups.com
>>>>>>>>>>>>>>>> <https://groups.google.com/d/msgid/kudu-user/b11f4053-5598-4d21-a027-2e1165bcd385%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>>>>>>>>>> .
>>>>>>>>>>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> You received this message because
you are subscribed to the
>>>>>>>>>>>>>> Google Groups "kudu-user" group.
>>>>>>>>>>>>>> To unsubscribe from this group and
stop receiving emails from
>>>>>>>>>>>>>> it, send an email to kudu-user+...@googlegroups.com.
>>>>>>>>>>>>>> To post to this group, send email
to kudu...@googlegroups.com
>>>>>>>>>>>>>> .
>>>>>>>>>>>>>> To view this discussion on the web
visit
>>>>>>>>>>>>>> https://groups.google.com/d/msgid/kudu-user/671f02c9-12db-4ecf-ae00-dded86315441%40googlegroups.com
>>>>>>>>>>>>>> <https://groups.google.com/d/msgid/kudu-user/671f02c9-12db-4ecf-ae00-dded86315441%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>>>>>>>> .
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>> You received this message because you are
subscribed to the
>>>>>>>>>>>> Google Groups "kudu-user" group.
>>>>>>>>>>>> To unsubscribe from this group and stop receiving
emails from
>>>>>>>>>>>> it, send an email to kudu-user+...@googlegroups.com.
>>>>>>>>>>>> To post to this group, send email to kudu...@googlegroups.com.
>>>>>>>>>>>> To view this discussion on the web visit
>>>>>>>>>>>> https://groups.google.com/d/msgid/kudu-user/9a724786-ec97-4d47-aae0-a705250b4296%40googlegroups.com
>>>>>>>>>>>> <https://groups.google.com/d/msgid/kudu-user/9a724786-ec97-4d47-aae0-a705250b4296%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>>>>>> .
>>>>>>>>>>>>
>>>>>>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>> --
>>>>>>>> You received this message because you are subscribed to the
Google
>>>>>>>> Groups "kudu-user" group.
>>>>>>>> To unsubscribe from this group and stop receiving emails
from it,
>>>>>>>> send an email to kudu-user+...@googlegroups.com.
>>>>>>>> To post to this group, send email to kudu...@googlegroups.com.
>>>>>>>> To view this discussion on the web visit
>>>>>>>> https://groups.google.com/d/msgid/kudu-user/ca553be1-6d72-4784-a3cb-7edd13c72679%40googlegroups.com
>>>>>>>> <https://groups.google.com/d/msgid/kudu-user/ca553be1-6d72-4784-a3cb-7edd13c72679%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>> .
>>>>>>>>
>>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Todd Lipcon
>>>>>>> Software Engineer, Cloudera
>>>>>>>
>>>>>> --
>>>>>> You received this message because you are subscribed to the Google
>>>>>> Groups "kudu-user" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>> send an email to kudu-user+...@googlegroups.com.
>>>>>> To post to this group, send email to kudu...@googlegroups.com.
>>>>>> To view this discussion on the web visit
>>>>>> https://groups.google.com/d/msgid/kudu-user/b28b91ae-e84c-4a07-ba09-9becece950ae%40googlegroups.com
>>>>>> <https://groups.google.com/d/msgid/kudu-user/b28b91ae-e84c-4a07-ba09-9becece950ae%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>> .
>>>>>>
>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Todd Lipcon
>>>>> Software Engineer, Cloudera
>>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "kudu-user" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to kudu-user+...@googlegroups.com.
>>>> To post to this group, send email to kudu...@googlegroups.com.
>>>> To view this discussion on the web visit
>>>> https://groups.google.com/d/msgid/kudu-user/97c9302a-1015-4881-b71f-5de91dc131da%40googlegroups.com
>>>> <https://groups.google.com/d/msgid/kudu-user/97c9302a-1015-4881-b71f-5de91dc131da%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>>
>>>
>>> --
>>> Todd Lipcon
>>> Software Engineer, Cloudera
>>>
>>
>> --
> You received this message because you are subscribed to the Google Groups
> "kudu-user" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to kudu-user+unsubscribe@googlegroups.com.
> To post to this group, send email to kudu-user@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/kudu-user/0c78f3f5-e5c1-4e41-8c86-c7168bd0e0f0%40googlegroups.com
> <https://groups.google.com/d/msgid/kudu-user/0c78f3f5-e5c1-4e41-8c86-c7168bd0e0f0%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

Mime
View raw message