kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Attila Bukor <abu...@apache.org>
Subject Re: How to decrease kudu server restart time
Date Mon, 13 Aug 2018 12:30:30 GMT
It's available in 1.7.0 and above only.
On Mon, Aug 13, 2018 at 07:16:00PM +0800, Gary Gao wrote:
> I'm using Kudu 1.6.0, does this version have the feature you mentioned :
> 
> The recent versions are using 3-4-3 replica
> replacement, meaning the tablet copy should be automatically canceled
> when the third replica comes online and the copy hasn't finished yet.
> 
> On Mon, Aug 13, 2018 at 5:16 PM Attila Bukor <abukor@apache.org> wrote:
> 
> > Hi Gary,
> >
> > Please find my answers inline.
> >
> > On Sun, Aug 12, 2018 at 01:17:03PM +0800, Gary Gao wrote:
> > > I have a kudu cluster of 40 nodes, when I realized that
> > > maintenance_manager_num_threads=1 is too small, I updated config file and
> > > restarted a kudu tablet server, but it took too long to start, longer
> > than
> > > --follower_unavailable_considered_failed_sec=600, causing tablet
> > > redistribution.
> > > Even if the kudu server started, it also spent too much copying tablet,
> > as
> > > the following tablet block copying log:
> > >
> > >
> > > Tablet 1ecbe230e14a4d9f9125dbc49c32860e of table
> > > 'impala::venus.ods_xk_pay_fee_order' is under-replicated: 1 replica(s)
> > not
> > > RUNNING
> > >   41e4489d38924c85a4810bd33ef60d80 (bj-yz-hadoop01-1-12:7050): bad state
> > >     State:       INITIALIZED
> > >     Data state:  TABLET_DATA_COPYING
> > >     Last status: Tablet Copy: Downloading block 0000000084111077
> > > (299837/1177225)
> > >   52a9ede038a04566860ecd2e54388738 (bj-yz-hadoop01-1-51:7050): RUNNING
> > >   b133f6fd0c274b93b21ffcbdcbbde830 (bj-yz-hadoop01-1-14:7050): RUNNING
> > > [LEADER]
> > >
> >
> > Which version are you using? The recent versions are using 3-4-3 replica
> > replacement, meaning the tablet copy should be automatically canceled
> > when the third replica comes online and the copy hasn't finished yet.
> >
> > >
> > > My Question are:
> > >
> > > 1. It seems kudu server spent a long time to open log block container,
> > how
> > > to speed up restarting kudu server ?
> >
> > The startup time of the tablet servers mostly depends on the number of
> > tablets hosted on the server. I'm not sure if there's any way to tune
> > it, aside from reducing the number of tablets. How many tablets do you
> > have per tablet server?
> >
> > >
> > > 2. I think the number of blocks have an influence on kudu server
> > restarting
> > > time and query time on specific tablet, more number of blocks, more
> > > restarting time and query time. Is this right ?
> >
> > I'm not sure how much the number of blocks influences the restart time,
> > maybe someone else can shed some light on this one. I'd focus on the
> > number of tablets though.
> >
> > The query latencies depend on how many blocks the server needs to read
> > from, but it's a matter of how well the data is compacted (either by
> > sequential writes instead of random writes, or whether the maintenance
> > managers compacted them), rather than the number of total blocks.
> >
> > >
> > > 3. Why there are more than 1 million blocks in a tablet, as shown in
> > above
> > > Tablet Copy log, while there are less than 500 thousands of records in
> > the
> > > tablet ?
> > >
> >
> > Each rowset will have multiple blocks (one per column, UNDO and
> > REDO deltas, and bloom filters). The number of rowsets depends on the
> > number of rows.
> >
> > > 4. How to reduce the number of block in tablet ?
> >
> > The maintenance managers perform compactions that reduce the number of
> > blocks per tablets. Other than this, less columns or less rows also
> > results in less blocks of course.
> >
> > - Attila
> >

Mime
View raw message