kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adar Dembo <a...@cloudera.com>
Subject Re: File descriptor limit for WAL
Date Fri, 24 Feb 2017 20:39:03 GMT
I think range partitioning is a fine solution for your use case,
though you should know that we're not recommending more than 4 TB of
total data (post-encoding/compression) per tserver at the moment. I
don't expect anything to break outright if you exceed that, but
startup will get slower and slower, as will operations that rewrite
tablet superblocks (such as flushes and compactions).

It's definitely safe to increase the ulimit for open files; we
typically test with higher values (like 32K or 64K). We don't use
select(2) directly; any fd polling in Kudu is done via libev which I
believe uses epoll(2) under the hood. There's one other place where we
use ppoll() (in RPC negotiation), but no select().

As the number of tablets increase, startup will become slower, and the
number of threads in the process will grow too (we start a certain
number of threads per tablet). Keep an eye out for that.

On Fri, Feb 24, 2017 at 10:04 AM, Paul Brannan
<paul.brannan@thesystech.com> wrote:
> I'm using the debs from the cloudera-kudu ppa with little change to the
> default configuration, so one master and one tablet server.  I set
> num_replicas(1) when creating each table.  I used range partitioning with
> (if I understand correctly) one large open-ended range.  So that should have
> 334 tablet replicas.
> I added two more tablet servers and was able to get to create 1002 tables
> (exactly 3*334) before running out of file descriptors.  Using hash
> partitioning instead of range partitioning, I was able to create 500 tables
> (roughly half).  This is using 2 hash buckets, so it's what I expect.  I run
> into the same limit when I have a single table with many range partitions
> (997 partitions on a single partition column).
> My goal here is to be able to keep N months of data (on the order of 100's
> of GB per day) and to be able to drop a single date from the beginning of
> the range.  Rows are only inserted for the current date, and rows for
> previous dates are not  modified.  Partitioning seems ideal for this case
> (it's mentioned as a use case in the non-covering range partitions
> document).  Is there a better solution?
> Does the 100-tablet limit only affect startup time?  In other words, if
> multiple-minute startup time is acceptable, then is there any other reason
> to limit each tablet server to 100 tablets?  Is it safe to increase ulimit
> for open files past 1024 (i.e. does the tablet server ever call select(2))?
> On Thu, Feb 16, 2017 at 3:33 PM, Adar Dembo <adar@cloudera.com> wrote:
>> Hi Paul,
>> As you discovered, Kudu holds WAL segments open until the tablets they
>> belong to are deleted. block_manager_max_open_files won't help here;
>> that just applies to files opened for accessing data blocks, not WAL
>> segments.
>> As far as WAL segments are concerned, we've previously discussed
>> "queiscing" tablets that haven't been used in some time, which would
>> involve halting their Raft consensus state machine and perhaps closing
>> their WAL segments. I can't find a JIRA for this feature, but I'm also
>> not aware of anyone working on it. If you're interested in
>> contributing to Kudu, this could be a worthwhile avenue for you to
>> explore further.
>> I'm a little fuzzy on the details, but I believe that by default a
>> tablet will retain anywhere from 2 to 10 WAL segments, all of them
>> open. The exact number depends on how "caught up" the replication
>> group is; if one peer is behind, more segments may be retained in
>> order to help that peer catch up in the future. The settings that
>> control these numbers are log_min_segments_to_retain and
>> log_max_segments_to_retain.
>> Out of curiosity, how many tablet replicas did your 334 tables
>> generate in total? You can deduce that by calculating, for each table,
>> the total number of partitions multiplied by the table's replication
>> factor. And across how many tservers were they all distributed? By
>> design, tservers can handle many tablets, but as usual, the
>> implementation lags the design, and at the moment we're recommending
>> no more than 100 tablets per tserver
>> (http://kudu.apache.org/docs/known_issues.html#_other_known_issues).
>> On Thu, Feb 16, 2017 at 8:42 AM, Paul Brannan
>> <paul.brannan@thesystech.com> wrote:
>> > I wrote a quick script today to see how kudu behaves if I create many
>> > tables.  After creating 334 tables, I started getting timeouts.  I see
>> > this
>> > in the master log file:
>> >
>> > W0216 11:37:48.961221 49810 catalog_manager.cc:2490] CreateTablet RPC
>> > for
>> > tablet 9b259d5c5ff74f04820240f2159bc1a0 on TS
>> > faaf4e9b6e5945d7a14953c4cc34f164 (telx-sb-dev2:7050) failed: IO error:
>> > Couldn't create tablet metadata: Failed to write tablet metadata
>> > 9b259d5c5ff74f04820240f2159bc1a0: Call to mkstemp() failed on name
>> > template
>> >
>> > /var/lib/kudu/tserver/tablet-meta/9b259d5c5ff74f04820240f2159bc1a0.tmp.XXXXXX:
>> > Too many open files (error 24)
>> >
>> > I decreased block_manager_max_open_files, but still got the same result.
>> > Lsof shows that the open files are for the WAL:
>> >
>> > kudu-tser 49648 kudu 1021u   REG        8,5 67108864   16385457
>> >
>> > /var/lib/kudu/tserver/wals/62b73d1b7f7a4e61a0a30a551e66230b/wal-000000001
>> > kudu-tser 49648 kudu 1022r   REG        8,5 67108864   16385457
>> >
>> > /var/lib/kudu/tserver/wals/62b73d1b7f7a4e61a0a30a551e66230b/wal-000000001
>> > kudu-tser 49648 kudu 1023u   REG        8,5 24000000   16385458
>> >
>> > /var/lib/kudu/tserver/wals/62b73d1b7f7a4e61a0a30a551e66230b/index.000000000
>> >
>> > The files do not get closed until the tables are deleted, even though no
>> > running process has any of those tables open.
>> >
>> > Is there a setting that will reduce the number of WAL files that get
>> > created
>> > or held open at any given point in time?

View raw message