kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: is the number of select count(1) from kudu_table exact?
Date Wed, 18 Jan 2017 07:03:58 GMT
Hi Darren,

It is expected that the result is exact.

We did have some bugs in earlier versions of the Impala integration that
could cause incorrect results (eg missing rows). Also, if you are
performing the backup just after completing an insert, it's worth noting
that the Impala integration doesn't currently guarantee "read-your-writes"
consistency. That is to say, there may be some small time window where you
may not see all the rows you just inserted.

What version of the IMPALA_KUDU parcel are you using in this deployment?

-Todd

On Tue, Jan 17, 2017 at 6:24 PM, Darren Hoo <darren.hoo@gmail.com> wrote:

> We have a kudu table with size about 120GB, when we try to backup the kudu
> to impala and stored as parquet on hdfs
>
> create table parquet_backup stored as parquet as select * from kudu_table
>
> but the two numbers we get by running
>
>    select count(1) from kudu_table
>    select count(1) from parquet_backup
>
> is Not equal.
>
> So my question is whether the result of  count(1) is an estimated number
> or something is wrong when we try to backup the kudu table?
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Mime
View raw message