kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Faraz Mateen <fmat...@an10.io>
Subject Re: "Too many open files" error
Date Mon, 07 Oct 2019 09:31:00 GMT

Thank you for the response. Having too many partitions is exactly what the
problem is. When I restart the tserver, it tries to open files against each
tablet and eventually crashes.

Is there a way to get around this and recover my data? Is there any config
I can change to run the tserver? Or can I add a new tablet server and
migrate existing tablets?

On Sat, Oct 5, 2019 at 10:05 PM Alexey Serbin <aserbin@cloudera.com> wrote:

> Hi,
> Most likely the issue happened because of high number of tablet replicas
> at the tablet server.  In case of high spike of in the input data rate,
> higher compaction activity might require more than usual number of file
> descriptors, since more files are opened.
> How many tablet replicas does that tablet server have?  It's not
> recommended to have too many:
> https://kudu.apache.org/docs/known_issues.html#_scale
> To understand what has happened, you need to take a look into the logs of
> the tablet server.  This might be useful:
> https://kudu.apache.org/docs/troubleshooting.html
> Overall, if there is only one (?) tablet server in the whole Kudu cluster,
> why to have 39 partitions per table?  I guess that's some sort of
> proof-of-concept/toy setup, but anyways.  Since all the tablet replicas end
> up at the same single tablet server, I don't see benefits from partitioning
> in that setup.  For the tablet server, it simply means x-times increased
> number of open file descriptors and increased memory usage.
> Kind regards,
> Alexey
> On Fri, Oct 4, 2019 at 4:21 AM Faraz Mateen <fmateen@an10.io> wrote:
>> Hi all,
>> I am facing a problem with my kudu setup where tablet server crashes with
>> "too many open files" error.
>> The setup consists of a single master and a single tablet server. Tables
>> created are such that there are 39 partitions per table. However not all
>> partitions have data that corresponds to them.
>> Yesterday my tserver crashed and when I am trying to restart the tserver,
>> it fails with the error:
>> I1004 03:50:39.896301  5669 ts_tablet_manager.cc:1173] T
>> cab85f15f06748d0b59161d9f3da55f7 P ee14d248ac994d0eb60dbb0db4ab3b09:
>> Registered tablet (data state: TABLET_DATA_READY)
>> W1004 03:50:39.923184  5687 os-util.cc:165] could not read
>> /proc/self/status: IO error: /proc/self/status: Too many open files (error
>> 24)
>> I1004 03:50:39.939460  5669 ts_tablet_manager.cc:1173] T
>> d8d68ce6f6ea49479c00d29709869f1f P ee14d248ac994d0eb60dbb0db4ab3b09:
>> Registered tablet (data state: TABLET_DATA_READY)
>> I have already modified ulimit of the machine:
>> root@vm-3:~# ulimit -a
>> core file size          (blocks, -c) 0
>> data seg size           (kbytes, -d) unlimited
>> scheduling priority             (-e) 0
>> file size               (blocks, -f) unlimited
>> pending signals                 (-i) 63923
>> max locked memory       (kbytes, -l) 16384
>> max memory size         (kbytes, -m) unlimited
>> open files                      (-n) 65535
>> pipe size            (512 bytes, -p) 8
>> POSIX message queues     (bytes, -q) 819200
>> real-time priority              (-r) 0
>> stack size              (kbytes, -s) 8192
>> cpu time               (seconds, -t) unlimited
>> max user processes              (-u) 65535
>> virtual memory          (kbytes, -v) unlimited
>> file locks                      (-x) unlimited
>> *Set up Details:*
>> Single master and tserver setup on a single VM.
>> 4 cores, 550GB hard disk, 16GB RAM
>> Kudu version 1.8 on ubuntu, installed through debian packages.
>> Before crash, data was being inserted in kudu at a very high rate. RAM
>> usage was around 87% and disk usage was around 84 percent.
>> Here is what I have tried so far:
>> 1- Set ulimit -n to 65535.
>> 2- Reboot the vm to get rid of stale processes.
>> 3- Set block_manager_max_open_files to 32000 in tserver flag file.
>> What I want to know now is:
>> 1- Why am I hitting this problem? Is this due to low resources on the VM
>> or high number of tablets on a single tserver?
>> 2- How can I get around this problem, recover my data and kudu services?
>> Would really appreciate some help on this.
>> --
>> Faraz Mateen

Faraz Mateen

View raw message