kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adar Dembo <a...@cloudera.com>
Subject Re: File descriptor limit for WAL
Date Thu, 16 Feb 2017 20:33:20 GMT
Hi Paul,

As you discovered, Kudu holds WAL segments open until the tablets they
belong to are deleted. block_manager_max_open_files won't help here;
that just applies to files opened for accessing data blocks, not WAL
segments.

As far as WAL segments are concerned, we've previously discussed
"queiscing" tablets that haven't been used in some time, which would
involve halting their Raft consensus state machine and perhaps closing
their WAL segments. I can't find a JIRA for this feature, but I'm also
not aware of anyone working on it. If you're interested in
contributing to Kudu, this could be a worthwhile avenue for you to
explore further.

I'm a little fuzzy on the details, but I believe that by default a
tablet will retain anywhere from 2 to 10 WAL segments, all of them
open. The exact number depends on how "caught up" the replication
group is; if one peer is behind, more segments may be retained in
order to help that peer catch up in the future. The settings that
control these numbers are log_min_segments_to_retain and
log_max_segments_to_retain.

Out of curiosity, how many tablet replicas did your 334 tables
generate in total? You can deduce that by calculating, for each table,
the total number of partitions multiplied by the table's replication
factor. And across how many tservers were they all distributed? By
design, tservers can handle many tablets, but as usual, the
implementation lags the design, and at the moment we're recommending
no more than 100 tablets per tserver
(http://kudu.apache.org/docs/known_issues.html#_other_known_issues).


On Thu, Feb 16, 2017 at 8:42 AM, Paul Brannan
<paul.brannan@thesystech.com> wrote:
> I wrote a quick script today to see how kudu behaves if I create many
> tables.  After creating 334 tables, I started getting timeouts.  I see this
> in the master log file:
>
> W0216 11:37:48.961221 49810 catalog_manager.cc:2490] CreateTablet RPC for
> tablet 9b259d5c5ff74f04820240f2159bc1a0 on TS
> faaf4e9b6e5945d7a14953c4cc34f164 (telx-sb-dev2:7050) failed: IO error:
> Couldn't create tablet metadata: Failed to write tablet metadata
> 9b259d5c5ff74f04820240f2159bc1a0: Call to mkstemp() failed on name template
> /var/lib/kudu/tserver/tablet-meta/9b259d5c5ff74f04820240f2159bc1a0.tmp.XXXXXX:
> Too many open files (error 24)
>
> I decreased block_manager_max_open_files, but still got the same result.
> Lsof shows that the open files are for the WAL:
>
> kudu-tser 49648 kudu 1021u   REG        8,5 67108864   16385457
> /var/lib/kudu/tserver/wals/62b73d1b7f7a4e61a0a30a551e66230b/wal-000000001
> kudu-tser 49648 kudu 1022r   REG        8,5 67108864   16385457
> /var/lib/kudu/tserver/wals/62b73d1b7f7a4e61a0a30a551e66230b/wal-000000001
> kudu-tser 49648 kudu 1023u   REG        8,5 24000000   16385458
> /var/lib/kudu/tserver/wals/62b73d1b7f7a4e61a0a30a551e66230b/index.000000000
>
> The files do not get closed until the tables are deleted, even though no
> running process has any of those tables open.
>
> Is there a setting that will reduce the number of WAL files that get created
> or held open at any given point in time?

Mime
View raw message