hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Touretsky, Gregory" <gregory.touret...@intel.com>
Subject RE: Hive and MapReduce
Date Tue, 13 Oct 2009 12:01:39 GMT
This is the only mapper in the system, and it's the only job running.
As per jobtracker page, Map Task Capacity is 64
Just one mapper is submitted... Nothing else is running in this cluster.
Note that if I submit the MR task directly (without Hive) - I have multiple mappers running
in parallel

>From the job status page (while the job is running):
Hadoop job_200910121100_0005 on itstl0016
User: gtouret
Job Name: select start.* from start whe...'2009-11-01'(1/1)
Job File: hdfs://itstl0016.iil.intel.com:9000/tmp/hadoop-gtouret/mapred/system/job_200910121100_0005/job.xml
Job Setup: Successful
Status: Running
Started at: Tue Oct 13 13:55:09 IST 2009
Running for: 2mins, 18sec
Job Cleanup: Pending
Kind	% Complete	Num Tasks	Pending	Running	Complete	Killed	Failed/Killed
													Task Attempts
map	16.28%	1		0		1		0		0		0 / 0
reduce0.00%		0		0		0		0		0		0 / 0



	Counter 	Map 	Reduce 	Total
Job Counters 	Launched map tasks 	0 	0 	1
Data-local map tasks 	0 	0 	1
FileSystemCounters 	HDFS_BYTES_READ 	4,158,378,045 	0 	4,158,378,045
Map-Reduce Framework 	Map input records 	11,766,636 	0 	11,766,636
Spilled Records 	0 	0 	0
Map input bytes 	4,093,640,704 	0 	4,093,640,704
Map output records 	0 	0 	0

-----Original Message-----
From: Ashish Thusoo [mailto:athusoo@facebook.com] 
Sent: Monday, October 12, 2009 11:19 PM
To: Touretsky, Gregory
Cc: hive-user@hadoop.apache.org
Subject: RE: Hive and MapReduce

All the other mappers show up as pending? Is this the only job running? How many map slots
does the jobtracker page show?
It almost seems like a case of some configuration problem with your hadoop deployment or a
lot of other things running on the cluster that take up all the other map slots.

Ashish 

-----Original Message-----
From: Touretsky, Gregory [mailto:gregory.touretsky@intel.com] 
Sent: Monday, October 12, 2009 11:57 AM
To: Ashish Thusoo
Cc: hive-user@hadoop.apache.org
Subject: RE: Hive and MapReduce

Right, the files are simple text files.
And yes, when I run similar query through MR interface I get 352 mappers.
But, as I see below, Hive started only one mapper:
itstl0016> $HADOOP_HOME/bin/hadoop job -status job_200910121100_0002

Job: job_200910121100_0002
file: hdfs://itstl0016.iil.intel.com:9000/tmp/hadoop-gtouret/mapred/system/job_200910121100_0002/job.xml
tracking URL: http://itstl0016.iil.intel.com:50030/jobdetails.jsp?jobid=job_200910121100_0002
map() completion: 1.0
reduce() completion: 1.0
Counters: 11
        Job Counters 
                Launched map tasks=1
                Data-local map tasks=1
        org.apache.hadoop.hive.ql.exec.FilterOperator$Counter
                PASSED=0
                FILTERED=63916590
        org.apache.hadoop.hive.ql.exec.FileSinkOperator$TableIdEnum
                TABLE_ID_1_ROWCOUNT=0
        FileSystemCounters
                HDFS_BYTES_READ=22663352877
        org.apache.hadoop.hive.ql.exec.MapOperator$Counter
                DESERIALIZE_ERRORS=0
        Map-Reduce Framework
                Map input records=63916590
                Spilled Records=0
                Map input bytes=22615687168
                Map output records=0

-----Original Message-----
From: Ashish Thusoo [mailto:athusoo@facebook.com]
Sent: Monday, October 12, 2009 8:53 PM
To: Touretsky, Gregory
Cc: hive-user@hadoop.apache.org
Subject: RE: Hive and MapReduce

Since the default split size is 64MB hadoop should run this on 352 mappers. I presume that
the files are simple text files...

Ashish 

-----Original Message-----
From: Touretsky, Gregory [mailto:gregory.touretsky@intel.com]
Sent: Monday, October 12, 2009 11:23 AM
To: Ashish Thusoo
Cc: hive-user@hadoop.apache.org; hive-dev@hadoop.apache.org
Subject: RE: Hive and MapReduce

It's 22GB file with ~60M+ records:
itstl0016> $HADOOP_HOME/bin/hadoop dfs -ls 
itstl0016> /user/hive/warehouse/start/stodhuge.out
Found 1 items
-rw-r--r--   3 gtouret supergroup 22661980380 2009-10-08 19:48 /user/hive/warehouse/start/stodhuge.out

And the cluster consists of 16 dual-core nodes.

hive> INSERT OVERWRITE TABLE start_oct30
    > select start.* from start
    > where start.SampleTime >= '2009-10-29' AND start.SampleTime <= '2009-11-01';
Total MapReduce jobs = 2 Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_200910121100_0002, Tracking URL = http://itstl0016.iil.intel.com:50030/jobdetails.jsp?jobid=job_200910121100_0002
Kill Command = /nfs/iil/disks/rep_tests_gtouret01/hadoop/bin/hadoop job  -Dmapred.job.tracker=itstl0016.iil.intel.com:9001
-kill job_200910121100_0002
2009-10-12 11:49:40,865 map = 0%,  reduce = 0%
2009-10-12 11:50:12,065 map = 1%,  reduce = 0%
    [ SKIPPED MANY LINES ]
2009-10-12 12:08:37,648 map = 99%,  reduce = 0%
2009-10-12 12:08:52,782 map = 100%,  reduce = 0%
2009-10-12 12:08:58,811 map = 100%,  reduce = 100% Ended Job = job_200910121100_0002 Moving
data to: hdfs://itstl0016.iil.intel.com:9000/tmp/hive-gtouret/329345984/10000
Loading data to table start_oct30
0 Rows loaded to start_oct30
OK
Time taken: 1160.775 seconds
hive>

itstl0016> $HADOOP_HOME/bin/hadoop job -status job_200910121100_0002

Job: job_200910121100_0002
file: hdfs://itstl0016.iil.intel.com:9000/tmp/hadoop-gtouret/mapred/system/job_200910121100_0002/job.xml
tracking URL: http://itstl0016.iil.intel.com:50030/jobdetails.jsp?jobid=job_200910121100_0002
map() completion: 1.0
reduce() completion: 1.0
Counters: 11
        Job Counters 
                Launched map tasks=1
                Data-local map tasks=1
        org.apache.hadoop.hive.ql.exec.FilterOperator$Counter
                PASSED=0
                FILTERED=63916590
        org.apache.hadoop.hive.ql.exec.FileSinkOperator$TableIdEnum
                TABLE_ID_1_ROWCOUNT=0
        FileSystemCounters
                HDFS_BYTES_READ=22663352877
        org.apache.hadoop.hive.ql.exec.MapOperator$Counter
                DESERIALIZE_ERRORS=0
        Map-Reduce Framework
                Map input records=63916590
                Spilled Records=0
                Map input bytes=22615687168
                Map output records=0

-- Gregory

-----Original Message-----
From: Ashish Thusoo [mailto:athusoo@facebook.com]
Sent: Monday, October 12, 2009 8:04 PM
To: Touretsky, Gregory
Cc: hive-user@hadoop.apache.org; hive-dev@hadoop.apache.org
Subject: RE: Hive and MapReduce

adding hive-user and hive-dev lists.
And removing the common mailing list..

Can you elaborate a bit on the datasize. By default Hive should just be relying on hadoop
to give you the number of mappers depending on the number of splits you have in your data.

Ashish

-----Original Message-----
From: Touretsky, Gregory [mailto:gregory.touretsky@intel.com]
Sent: Monday, October 12, 2009 3:02 AM
To: Touretsky, Gregory; common-user@hadoop.apache.org
Subject: RE: Hive and MapReduce

Ok, the patch below actually works. Re-built Hadoop cluster and everything works now.
Now I have to understand how to force Hive to run >1 mapper for complicated query on the
large table...

From: Touretsky, Gregory
Sent: Sunday, October 11, 2009 4:39 PM
To: common-user@hadoop.apache.org
Cc: Touretsky, Gregory
Subject: Hive and MapReduce

Hi,

   I'm running Hadoop 0.20.1 and Hive (checked out revision 824063).
Direct MapReduce task succeeds, but Map task created by Hive fails:

hive> select * from pokes where foo>100;
Total MapReduce jobs = 1
Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_200910111626_0001,
Tracking URL = http://itstl0016.iil.intel.com:50030/jobdetails.jsp?jobid=job_200910111626_0001
Kill Command = /nfs/iil/disks/rep_tests_gtouret01/hadoop/bin/hadoop job  -Dmapred.job.tracker=itstl0016.iil.intel.com:9001
-kill job_200910111626_0001
2009-10-11 04:26:57,844 map = 100%,  reduce = 100% Ended Job = job_200910111626_0001 with
errors
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.ExecDriver

>From the logs/hadoop-UUUU-jobtracker-XXXX.iil.intel.com.log:
2009-10-11 16:26:56,829 INFO org.apache.hadoop.mapred.JobInProgress: Initializing job_200910111626_0001
2009-10-11 16:26:57,091 INFO org.apache.hadoop.mapred.JobInProgress: Input size for job job_200910111626_0001
= 13. Number of splits = 1
2009-10-11 16:26:57,225 ERROR org.apache.hadoop.mapred.JobTracker: Job initialization failed:
java.lang.IllegalArgumentException: Network location name contains /: /IDC1-DC201/WE/34  
 (I've had the same issue with the /default_rack)
        at org.apache.hadoop.net.NodeBase.set(NodeBase.java:75)
        at org.apache.hadoop.net.NodeBase.<init>(NodeBase.java:57)
        at org.apache.hadoop.mapred.JobTracker.addHostToNodeMapping(JobTracker.java:2390)
        at org.apache.hadoop.mapred.JobTracker.resolveAndAddToTopology(JobTracker.java:2384)
        at org.apache.hadoop.mapred.JobInProgress.createCache(JobInProgress.java:349)
        at org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:450)
        at org.apache.hadoop.mapred.JobTracker.initJob(JobTracker.java:3147)
        at org.apache.hadoop.mapred.EagerTaskInitializationListener$InitJob.run(EagerTaskInitializationListener.java:79)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)

2009-10-11 16:26:57,225 INFO org.apache.hadoop.mapred.JobTracker: Failing job job_200910111626_0001
2009-10-11 16:26:57,866 INFO org.apache.hadoop.mapred.JobTracker: Killing job job_200910111626_0001

Any suggestion?
I saw patches in https://issues.apache.org/jira/browse/HADOOP-5759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12712524#action_12712524,
but I can't apply all of them cleanly to my Hadoop sources...

Thanks,
   Gregory
---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for the sole use of the
intended recipient(s). Any review or distribution by others is strictly prohibited. If you
are not the intended recipient, please contact the sender and delete all copies.
---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for the sole use of the
intended recipient(s). Any review or distribution by others is strictly prohibited. If you
are not the intended recipient, please contact the sender and delete all copies.

---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for the sole use of the
intended recipient(s). Any review or distribution by others is strictly prohibited. If you
are not the intended recipient, please contact the sender and delete all copies.

---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


Mime
View raw message