hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Moore, Douglas" <Douglas.Mo...@thinkbiganalytics.com>
Subject Re: Over-logging by ORC packages
Date Mon, 06 Apr 2015 18:19:31 GMT
Owen, we're seeing a millions of those log entries.

There are three, one for each package listed below in the revised hive-log4j.settings., one
full example provided below.
Seems to repeat fewer than per-row (that would be billions). Perhaps repeats for each and
every partition in a table (1000's to 10'000s).

To reproduce, create a y/m/d partitioned table and do select * from <table> where year=2015
and month=4 and day = 6 limit 1;
- Douglas

From: Owen O'Malley <omalley@apache.org<mailto:omalley@apache.org>>
Reply-To: <user@hive.apache.org<mailto:user@hive.apache.org>>
Date: Mon, 6 Apr 2015 11:13:28 -0700
To: "user@hive.apache.org<mailto:user@hive.apache.org>" <user@hive.apache.org<mailto:user@hive.apache.org>>
Subject: Re: Over-logging by ORC packages

Sorry for the excessive logging. The pushdown logging should only be at the start, is there
a particular message that was being repeated per a row?

Thanks,
   Owen

On Mon, Apr 6, 2015 at 9:15 AM, Moore, Douglas <Douglas.Moore@thinkbiganalytics.com<mailto:Douglas.Moore@thinkbiganalytics.com>>
wrote:
On a cluster recently upgraded to Hive 0.14 (HDP 2.2) we found that Gigabytes and millions
more INFO level hive.log entries from ORC packages were being logged.
I feel these log entries should be at the DEBUG level.
Is there an existing bug in Hive or ORC?

Here is one example:
2015-04-06 15:12:43,212 INFO  orc.OrcInputFormat (OrcInputFormat.java:setSearchArgument(298))
- ORC pushdown predicate: leaf-0 = (EQUALS company XYZ)
leaf-1 = (EQUALS site DEF)
leaf-2 = (EQUALS table ABC)
expr = (and leaf-0 leaf-1 leaf-2)

To get an acceptable amount of logging that did not fill /tmp we had to add these entries
to /etc/hive/conf/hive-log4j.settings:
log4j.logger.org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger=WARN,DRFA
log4j.logger.org.apache.hadoop.hive.ql.io.orc.ReaderImpl=WARN,DRFA
log4j.logger.org.apache.hadoop.hive.ql.io.orc.OrcInputFormat=WARN,DRFA


While I'm on the subject, to operationally harden Hive, I think Hive should use a more aggressive
rolling file appender by default, one that can roll hourly or max size, compress the rolled
logsā€¦

- Douglas


Mime
View raw message