Hi Geetika. While I don't know anything about TPCH data, when people load data and see less rows it's usually because of duplicated primary keys. Kudu, unlike parquet, has a unique key constraint. What's the schema for the Kudu table?

Also, might be useful to know what Kudu version and Impala version you are using.

-Will

On Wed, May 9, 2018 at 10:03 PM, Geetika Gupta <geetika.gupta@knoldus.in> wrote:
Hi community,

We executed the below command to load data in KUDU, but the table in which we loaded the data has less number of rows. We executed the following command:

insert into LINEITEM select * from PARQUETIMPALA500.LINEITEM

This query was successful, but when we tried the count(*) on both the tables, row count was different:

0: jdbc:hive2://slave2:21050/default> select count(*) from lineitem
. . . . . . . . . . . . . . . . . . > ;
536870912

0: jdbc:hive2://slave2:21050/default> select count(*) from parquetimpala500.lineitem;
3000028242

We are loading 500GB of TPCH data in kudu from parquet table.

--
Regards,
Geetika Gupta