hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sanjay Subramanian <Sanjay.Subraman...@wizecommerce.com>
Subject Re: only one mapper
Date Wed, 21 Aug 2013 19:13:53 GMT
Hi

Try this setting in your hive query

SET mapreduce.input.fileinputformat.split.maxsize=<some bytes>;

If u set this value "low" then the MR job will use this size to split the input LZO files
and u will get multiple mappers (and make sure the input LZO files are indexed I.e. .LZO.INDEX
files are created)

sanjay


From: Edward Capriolo <edlinuxguru@gmail.com<mailto:edlinuxguru@gmail.com>>
Reply-To: "user@hive.apache.org<mailto:user@hive.apache.org>" <user@hive.apache.org<mailto:user@hive.apache.org>>
Date: Wednesday, August 21, 2013 10:43 AM
To: "user@hive.apache.org<mailto:user@hive.apache.org>" <user@hive.apache.org<mailto:user@hive.apache.org>>
Subject: Re: only one mapper

LZO files are only splittable if you index them. Sequence files compresses with LZO are splittable
without being indexed.

Snappy + SequenceFile is a better option then LZO.


On Wed, Aug 21, 2013 at 1:39 PM, Igor Tatarinov <igor@decide.com<mailto:igor@decide.com>>
wrote:
LZO files are combinable so check your max split setting.
http://mail-archives.apache.org/mod_mbox/hive-user/201107.mbox/%3C4E328964.7000202@gmail.com%3E

igor
decide.com<http://decide.com>



On Wed, Aug 21, 2013 at 2:17 AM, 闫昆 <yankunhadoop@gmail.com<mailto:yankunhadoop@gmail.com>>
wrote:
hi all when i use hive
hive job make only one mapper actually my file split 18 block my block size is 128MB and data
size 2GB
i use lzo compression and create file.lzo and make index file.lzo.index
i use hive 0.10.0

Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Cannot run job locally: Input Size (= 2304560827) is larger than hive.exec.mode.local.auto.inputbytes.max
(= 134217728)
Starting Job = job_1377071515613_0003, Tracking URL = http://hydra0001:8088/proxy/application_1377071515613_0003/
Kill Command = /opt/module/hadoop-2.0.0-cdh4.3.0/bin/hadoop job  -kill job_1377071515613_0003
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2013-08-21 16:44:30,237 Stage-1 map = 0%,  reduce = 0%
2013-08-21 16:44:40,495 Stage-1 map = 2%,  reduce = 0%, Cumulative CPU 6.81 sec
2013-08-21 16:44:41,710 Stage-1 map = 2%,  reduce = 0%, Cumulative CPU 6.81 sec
2013-08-21 16:44:42,919 Stage-1 map = 2%,  reduce = 0%, Cumulative CPU 6.81 sec
2013-08-21 16:44:44,117 Stage-1 map = 3%,  reduce = 0%, Cumulative CPU 9.95 sec
2013-08-21 16:44:45,333 Stage-1 map = 3%,  reduce = 0%, Cumulative CPU 9.95 sec
2013-08-21 16:44:46,530 Stage-1 map = 5%,  reduce = 0%, Cumulative CPU 13.0 sec

--

In the Hadoop world, I am just a novice, explore the entire Hadoop ecosystem, I hope one day
I can contribute their own code

YanBit
yankunhadoop@gmail.com<mailto:yankunhadoop@gmail.com>




CONFIDENTIALITY NOTICE
======================
This email message and any attachments are for the exclusive use of the intended recipient(s)
and may contain confidential and privileged information. Any unauthorized review, use, disclosure
or distribution is prohibited. If you are not the intended recipient, please contact the sender
by reply email and destroy all copies of the original message along with any attachments,
from your computer system. If you are the intended recipient, please be advised that the content
of this message is subject to access, review and disclosure by the sender's Email System Administrator.
Mime
View raw message