hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Harris <Ryan.Har...@zionsbancorp.com>
Subject RE: CombineHiveInputFormat not working
Date Wed, 30 Sep 2015 22:14:19 GMT
I would suggest trying:
set hive.hadoop.supports.splittable.combineinputformat = true;

you might also need to increase mapreduce.input.fileinputformat.split.minsize to something
larger, like 32MB
set mapreduce.input.fileinputformat.split.minsize = 33554432;

Depending on your hadoop distro and version, be potentially aware of
https://issues.apache.org/jira/browse/MAPREDUCE-1597
and
https://issues.apache.org/jira/browse/MAPREDUCE-5537

test it and see...

From: Pradeep Gollakota [mailto:pradeepg26@gmail.com]
Sent: Wednesday, September 30, 2015 3:33 PM
To: user@hive.apache.org
Subject: Re: CombineHiveInputFormat not working

mapred.min.split.size = mapreduce.input.fileinputformat.split.maxsize = 1
mapred.max.split.size = mapreduce.input.fileinputformat.split.maxsize = 134217728
hive.hadoop.supports.splittable.combineinputformat = false

My average file size is pretty small... it's usually between 500K and 20MB.

So it looks like the splittable support is turned off? I've been seeing some posts on the
mailing list saying there's correctness problems when using this and LZO.

Is this still the case? Can I turn this on with LZ4?

Thanks!

On Wed, Sep 30, 2015 at 1:38 PM, Ryan Harris <Ryan.Harris@zionsbancorp.com<mailto:Ryan.Harris@zionsbancorp.com>>
wrote:
Also...
mapreduce.input.fileinputformat.split.maxsize

and, what is the size of your input files?

From: Ryan Harris
Sent: Wednesday, September 30, 2015 2:37 PM
To: 'user@hive.apache.org<mailto:user@hive.apache.org>'
Subject: RE: CombineHiveInputFormat not working

what are your values for:
mapred.min.split.size
mapred.max.split.size
hive.hadoop.supports.splittable.combineinputformat


From: Pradeep Gollakota [mailto:pradeepg26@gmail.com<mailto:pradeepg26@gmail.com>]
Sent: Wednesday, September 30, 2015 2:20 PM
To: user@hive.apache.org<mailto:user@hive.apache.org>
Subject: CombineHiveInputFormat not working

Hi all,

I have an external table of with the following DDL.

```
DROP TABLE IF EXISTS raw_events;
CREATE EXTERNAL TABLE IF NOT EXISTS raw_events (
    raw_event_string string)
PARTITIONED BY (dc string, community string, dt string)
STORED AS TEXTFILE
LOCATION '/lithium/events/{dc}/{community}/events/{year}/{month}/{day}'
```

The files are loaded externally and are LZ4 compressed. When I run a query on this table for
a single day, I'm getting 1 mapper per file even though the input format is set to CombineHiveInputFormat.

Does anyone know if CombineHiveInputFormat does not work with LZ4 compressed files or have
any idea why split combination is not working?

Thanks!
Pradeep
________________________________
THIS ELECTRONIC MESSAGE, INCLUDING ANY ACCOMPANYING DOCUMENTS, IS CONFIDENTIAL and may contain
information that is privileged and exempt from disclosure under applicable law. If you are
neither the intended recipient nor responsible for delivering the message to the intended
recipient, please note that any dissemination, distribution, copying or the taking of any
action in reliance upon the message is strictly prohibited. If you have received this communication
in error, please notify the sender immediately. Thank you.


======================================================================
THIS ELECTRONIC MESSAGE, INCLUDING ANY ACCOMPANYING DOCUMENTS, IS CONFIDENTIAL and may contain
information that is privileged and exempt from disclosure under applicable law. If you are
neither the intended recipient nor responsible for delivering the message to the intended
recipient, please note that any dissemination, distribution, copying or the taking of any
action in reliance upon the message is strictly prohibited. If you have received this communication
in error, please notify the sender immediately.  Thank you.
Mime
View raw message