kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gary Gao <garygaow...@gmail.com>
Subject Re: How to decrease kudu server restart time
Date Mon, 13 Aug 2018 11:16:00 GMT
I'm using Kudu 1.6.0, does this version have the feature you mentioned :

The recent versions are using 3-4-3 replica
replacement, meaning the tablet copy should be automatically canceled
when the third replica comes online and the copy hasn't finished yet.

On Mon, Aug 13, 2018 at 5:16 PM Attila Bukor <abukor@apache.org> wrote:

> Hi Gary,
>
> Please find my answers inline.
>
> On Sun, Aug 12, 2018 at 01:17:03PM +0800, Gary Gao wrote:
> > I have a kudu cluster of 40 nodes, when I realized that
> > maintenance_manager_num_threads=1 is too small, I updated config file and
> > restarted a kudu tablet server, but it took too long to start, longer
> than
> > --follower_unavailable_considered_failed_sec=600, causing tablet
> > redistribution.
> > Even if the kudu server started, it also spent too much copying tablet,
> as
> > the following tablet block copying log:
> >
> >
> > Tablet 1ecbe230e14a4d9f9125dbc49c32860e of table
> > 'impala::venus.ods_xk_pay_fee_order' is under-replicated: 1 replica(s)
> not
> > RUNNING
> >   41e4489d38924c85a4810bd33ef60d80 (bj-yz-hadoop01-1-12:7050): bad state
> >     State:       INITIALIZED
> >     Data state:  TABLET_DATA_COPYING
> >     Last status: Tablet Copy: Downloading block 0000000084111077
> > (299837/1177225)
> >   52a9ede038a04566860ecd2e54388738 (bj-yz-hadoop01-1-51:7050): RUNNING
> >   b133f6fd0c274b93b21ffcbdcbbde830 (bj-yz-hadoop01-1-14:7050): RUNNING
> > [LEADER]
> >
>
> Which version are you using? The recent versions are using 3-4-3 replica
> replacement, meaning the tablet copy should be automatically canceled
> when the third replica comes online and the copy hasn't finished yet.
>
> >
> > My Question are:
> >
> > 1. It seems kudu server spent a long time to open log block container,
> how
> > to speed up restarting kudu server ?
>
> The startup time of the tablet servers mostly depends on the number of
> tablets hosted on the server. I'm not sure if there's any way to tune
> it, aside from reducing the number of tablets. How many tablets do you
> have per tablet server?
>
> >
> > 2. I think the number of blocks have an influence on kudu server
> restarting
> > time and query time on specific tablet, more number of blocks, more
> > restarting time and query time. Is this right ?
>
> I'm not sure how much the number of blocks influences the restart time,
> maybe someone else can shed some light on this one. I'd focus on the
> number of tablets though.
>
> The query latencies depend on how many blocks the server needs to read
> from, but it's a matter of how well the data is compacted (either by
> sequential writes instead of random writes, or whether the maintenance
> managers compacted them), rather than the number of total blocks.
>
> >
> > 3. Why there are more than 1 million blocks in a tablet, as shown in
> above
> > Tablet Copy log, while there are less than 500 thousands of records in
> the
> > tablet ?
> >
>
> Each rowset will have multiple blocks (one per column, UNDO and
> REDO deltas, and bloom filters). The number of rowsets depends on the
> number of rows.
>
> > 4. How to reduce the number of block in tablet ?
>
> The maintenance managers perform compactions that reduce the number of
> blocks per tablets. Other than this, less columns or less rows also
> results in less blocks of course.
>
> - Attila
>

Mime
View raw message