kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "一米阳光" <710339...@qq.com>
Subject scan performance super bad
Date Sun, 13 May 2018 07:56:04 GMT
hi, i have faced a difficult problem when using kudu 1.6.

my kudu table schema is generally like this:
column name:key, type:string, prefix encoding, lz4 compression, primary key
column name:value, type:string, lz4 compression


the primary key is built from several parts:
001320_201803220420_00000001
the first part is a unique id,
the second part is time format string, 
the third part is incremental integer(for a unique id and an fixed time, there may exist multi
value, so i used this part to distinguish)


the table range partition use the first part, split it like below
range<005000
005000<= range <010000
010000<= range <015000
015000<= range <020000
.....
.....
995000<= range


when i want to scan data for a unique id and range of time, the lower bound like 001320_201803220420_00000001
and the higher bound like 001320_201803230420_99999999, it takes about 500ms to call kuduScanner.nextRows()
and the number of rows it returns is between 20~50.  All size of data between the bound is
about 8000, so i should call hundreds times nextRows() to fetch all data, and it finally cost
several minutes.


i don't know why this happened and how to resolve it....maybe the final solution is that i
should giving up kudu, using hbase instead...
Mime
View raw message