hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eugene Koifman <ekoif...@hortonworks.com>
Subject Re: Optimize Hive Query
Date Mon, 27 Jun 2016 22:21:39 GMT
if you have many acid tables you almost certainly want more than 2 workers.  If you have 2
workers (and a single metastore instance) you can run at most 2 compaction jobs at a time.
 Unless the tables are very small, compaction may fall behind if it's configured to run too
serially.

In order for compactions to run automatically, at a minimum you have to have hive.compactor.initiator.on=true
for one standalone metastore instance.
hive.compactor.delta.num.threshold determines when compaction is triggered for a given table/partition.
There is more details in https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions#HiveTransactions-Configuration

Look for log messages in metastore.log form Initiator/Cleaner classes.  If you don't see any,
it must be disabled.

SHOW COMPACTIONS is a command you can run at CLI to see if there are any currently running.

you can also use ALTER TABLE (https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AlterTable/PartitionCompact)
to launch compaction on demand.

could you send results of dfs -ls /apps/hive/warehouse/PRDDB.db/tuning_dd_key

thanks,
Eugene


From: "@Sanjiv Singh" <sanjiv.is.on@gmail.com<mailto:sanjiv.is.on@gmail.com>>
Reply-To: "user@hive.apache.org<mailto:user@hive.apache.org>" <user@hive.apache.org<mailto:user@hive.apache.org>>,
"sanjiv.is.on@gmail.com<mailto:sanjiv.is.on@gmail.com>" <sanjiv.is.on@gmail.com<mailto:sanjiv.is.on@gmail.com>>
Date: Sunday, June 26, 2016 at 1:11 PM
To: Gopal Vijayaraghavan <gopalv@apache.org<mailto:gopalv@apache.org>>
Cc: "user@hive.apache.org<mailto:user@hive.apache.org>" <user@hive.apache.org<mailto:user@hive.apache.org>>
Subject: Re: Optimize Hive Query

Thanks Gopal for your inputs ....For now I have create NON ACID table and loaded data ....see
below from logs proper group splits happening .

2016-06-25 12:52:00,160 [INFO] [InputInitializer {Map 1} #0] |tez.HiveSplitGenerator|: Number
of grouped splits: 512


On compaction issue , Compaction enabled with two workers. why compaction not happened ? will
check metastore logs.

I have too many ACID tables on hive and how many worker should be configured ? currently it
is 2.

Thanks a lot once again.


Regards
Sanjiv Singh
Mob :  +091 9990-447-339

On Fri, Jun 24, 2016 at 9:14 PM, @Sanjiv Singh <sanjiv.is.on@gmail.com<mailto:sanjiv.is.on@gmail.com>>
wrote:
Thanks Gopal for your inputs. Let me run compaction explicitly on table then see how query
works.



Let

Regards
Sanjiv Singh
Mob :  +091 9990-447-339

On Fri, Jun 24, 2016 at 7:53 PM, Gopal Vijayaraghavan <gopalv@apache.org<mailto:gopalv@apache.org>>
wrote:

> Yes for this tables, ACID enabled.  it has only 256 files for each
>buckets. these are create only when data initially loaded in this table.

Yes, the initial load goes in as an insert DELTA too - that requires
another compaction to move into base files.

The fact that they haven't been automatically compacted yet, suggests that
the compactor isn't working for some reason (check hive metastore logs).

> One thing that I am not able to understand that its is running with 1
>MAPPER.

The size of deltas shows up as 0, till the compaction goes through - in
Hive2, it will be -1 which will be correctly interpreted as "unknown size".


> | -rw-r--r--   3 H56473 hdfs  215973009 2016-06-23 17:38
>/apps/hive/warehouse/PRDDB.db/tuning_dd_key/delta_0001570_0001570/bucket_0
>0000  |

Clearly an issue due to the lack of compaction - I see a single delta with
255 buckets and no base_* files at all.

Cheers,
Gopal











Mime
View raw message