hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <edlinuxg...@gmail.com>
Subject Re: Experience of Hive local mode execution style
Date Thu, 04 Jul 2013 15:53:16 GMT
Since you are launching locally you have to account for this.
1) If multiple jobs are running they become a burden on the local memory of
the system
2) Your local parameters like java heap size Xmx or mapred.child.java.opts
may be getting applied locally, if you are doing distinct queries they may
use a lot of memory or spill to disk quite often

However what you are reporting does not look like a memory error, although
distinct queries can become fairly intense. If you can repeat the problem
with empty tables it is likely a bug but if you can't it just means that
query takes to much memory for local mode.


On Thu, Jul 4, 2013 at 6:21 AM, Guillaume Allain <GuillaumeA@blinkbox.com>wrote:

>  > Local mode really helps with those little delays.
>
> It definately helps for small data sets. But my concerns are about
> consistency of results with distributed modes and some requests that fails
> only when it is triggered (see my description below).
>
>
>  ------------------------------
> *From:* Edward Capriolo
> *Sent:* 03 July 2013 00:07
> *To:* user@hive.apache.org
> *Subject:* Re: Experience of Hive local mode execution style
>
>  Local mode is fast. In particular older version pf hadoop take a lot of
> time scheduling tasks and a delay betwen map and reduce phase.
>
> Local mode really helps with those little delays.
>
> On Monday, July 1, 2013, Guillaume Allain <GuillaumeA@blinkbox.com> wrote:
> > Hi all,
> >
> > Would anybody have any comments or feedback about the hive local mode
> execution? It is advertised as providing a boost to performance for small
> data sets. It seem to fit nicely when running unit/integration tests on
> single node or virtual machine.
> >
> > My exact questions are the following :
> >
> > - How significantly diverge the local mode execution of queries compared
> to distributed mode? Do the results may be different in some way?
> >
> > - I have had encountered error when running complex queries (with
> several joins/distinct/groupbys) that seem to relate to configuration (see
> below). I got no exact answers from the ML and I am kind of ready to dive
> into the source code.
> >
> > Any idea where I should aim in order to solve that particular problem?
> >
> > Thanks in advance,
> >
> > Guillaume
> >
> > ________________________________
> > From: Guillaume Allain
> > Sent: 18 June 2013 12:14
> > To: user@hive.apache.org
> > Subject: FileNotFoundException when using hive local mode execution style
> >
> > Hi all,
> >
> > I plan to use  hive local in order to speed-up unit testing on (very)
> small data sets. (Data is still on hdfs). I switch the local mode by
> setting the following variables :
> >
> > SET hive.exec.mode.local.auto=true;
> > SET mapred.local.dir=/user;
> > SET mapred.tmp.dir=file:///tmp;
> > (plus creating needed directories and permissions)
> >
> > Simple GROUP BY, INNER and OUTER JOIN queries work just fine (with up to
> 3 jobs) with nice performance improvements.
> >
> > Unfortunately I ran into a
> FileNotFoundException:/tmp/vagrant/hive_2013-06-17_16-10-05_614_7672774118904458113/-mr-10000/1/emptyFile)
> on some more complex query (4 jobs, distinct on top of several joins, see
> below logs if needed).
> >
> > Any idea about that error? What other option I am missing to have a
> fully fonctional local mode?
> >
> > Thanks in advance, Guillaume
> >
> > $ tail -50
> /tmp/vagrant/vagrant_20130617171313_82baad8b-1961-4055-a52e-d8865b2cd4f8.lo
> >
> > 2013-06-17 16:10:05,669 INFO  exec.ExecDriver
> (ExecDriver.java:execute(320)) - Using
> org.apache.hadoop.hive.ql.io.CombineHiveInputFormat
> > 2013-06-17 16:10:05,688 INFO  exec.ExecDriver
> (ExecDriver.java:execute(342)) - adding libjars:
> file:///opt/events-warehouse/build/jars/joda-time.jar,file:///opt/events-warehouse/build/jars/we7-hive-udfs.jar,file:///usr/lib/hive/lib/hive-json-serde-0.2.jar,file:///usr/lib/hive/lib/hive-builtins-0.9.0-cdh4.1.2.jar,file:///opt/events-warehouse/build/jars/guava.jar
> > 2013-06-17 16:10:05,688 INFO  exec.ExecDriver
> (ExecDriver.java:addInputPaths(840)) - Processing alias dc
> > 2013-06-17 16:10:05,688 INFO  exec.ExecDriver
> (ExecDriver.java:addInputPaths(858)) - Adding input file
> hdfs://localhost/user/hive/warehouse/events_super_mart_test.db/dim_cohorts
> > 2013-06-17 16:10:05,689 INFO  exec.Utilities
> (Utilities.java:isEmptyPath(1807)) - Content Summary not cached for
> hdfs://localhost/user/hive/warehouse/events_super_mart_test.db/dim_cohorts
> > 2013-06-17 16:10:06,185 INFO  exec.ExecDriver
> (ExecDriver.java:addInputPath(789)) - Changed input file to
> file:/tmp/vagrant/hive_2013-06-17_16-10-05_614_7672774118904458113/-mr-10000/1
> > 2013-06-17 16:10:06,226 INFO  exec.ExecDriver
> (ExecDriver.java:addInputPaths(840)) - Processing alias $INTNAME
> > 2013-06-17 16:10:06,226 INFO  exec.ExecDriver
> (ExecDriver.java:addInputPaths(858)) - Adding input file
> hdfs://localhost/tmp/hive-vagrant/hive_2013-06-17_16-09-42_560_4077294489999242367/-mr-10004
> > 2013-06-17 16:10:06,226 INFO  exec.Utilities
> (Utilities.java:isEmptyPath(1807)) - Content Summary not cached for
> hdfs://localhost/tmp/hive-vagrant/hive_2013-06-17_16-09-42_560_4077294489999242367/-mr-10004
> > 2013-06-17 16:10:06,681 WARN  conf.Configuration
> (Configuration.java:warnOnceIfDeprecated(808)) - session.id is
> deprecated. Instead, use dfs.metrics.session-id
> > 2013-06-17 16:10:06,682 INFO  jvm.JvmMetrics (JvmMetrics.java:init(76))
> - Initializing JVM Metrics with processName=JobTracker, sessionId=
> > 2013-06-17 16:10:06,688 INFO  exec.ExecDriver
> (ExecDriver.java:createTmpDirs(215)) - Making Temp Directory:
> hdfs://localhost/tmp/hive-vagrant/hive_2013-06-17_16-09-42_560_4077294489999242367/-mr-10002
> > 2013-06-17 16:10:06,706 WARN  mapred.JobClient
> (JobClient.java:copyAndConfigureFiles(704)) - Use GenericOptionsParser for
> parsing the arguments. Applications should implement Tool for the same.
> > 2013-06-17 16:10:06,942 INFO  io.CombineHiveInputFormat
> (CombineHiveInputFormat.java:getSplits(370)) - CombineHiveInputSplit
> creating pool for
> file:/tmp/vagrant/hive_2013-06-17_16-10-05_614_7672774118904458113/-mr-10000/1;
> using filter path
> file:/tmp/vagrant/hive_2013-06-17_16-10-05_614_7672774118904458113/-mr-10000/1
> > 2013-06-17 16:10:06,943 INFO  io.CombineHiveInputFormat
> (CombineHiveInputFormat.java:getSplits(370)) - CombineHiveInputSplit
> creating pool for
> hdfs://localhost/tmp/hive-vagrant/hive_2013-06-17_16-09-42_560_4077294489999242367/-mr-10004;
> using filter path
> hdfs://localhost/tmp/hive-vagrant/hive_2013-06-17_16-09-42_560_4077294489999242367/-mr-10004
> > 2013-06-17 16:10:06,951 INFO  mapred.FileInputFormat
> (FileInputFormat.java:listStatus(196)) - Total input paths to process : 2
> > 2013-06-17 16:10:06,953 INFO  mapred.JobClient (JobClient.java:run(982))
> - Cleaning up the staging area
> file:/user/vagrant2000733611/.staging/job_local_0001
> > 2013-06-17 16:10:06,953 ERROR security.UserGroupInformation
> (UserGroupInformation.java:doAs(1335)) - PriviledgedActionException
> as:vagrant (auth:SIMPLE) cause:java.io.FileNotFoundException: File does not
> exist:
> /tmp/vagrant/hive_2013-06-17_16-10-05_614_7672774118904458113/-mr-10000/1/emptyFile
> > 2013-06-17 16:10:06,956 ERROR exec.ExecDriver
> (SessionState.java:printError(403)) - Job Submission failed with exception
> 'java.io.FileNotFoundException(File does not exist:
> /tmp/vagrant/hive_2013-06-17_16-10-05_614_7672774118904458113/-mr-10000/1/emptyFile)'
> > java.io.FileNotFoundException: File does not exist:
> /tmp/vagrant/hive_2013-06-17_16-10-05_614_7672774118904458113/-mr-10000/1/emptyFile
> >     at
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:787)
> >     at
> org.apache.hadoop.mapred.lib.CombineFileInputFormat$OneFileInfo.<init>(CombineFileInputFormat.java:462)
> >     at
> org.apache.hadoop.mapred.lib.CombineFileInputFormat.getMoreSplits(CombineFileInputFormat.java:256)
> >     at
> org.apache.hadoop.mapred.lib.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:212)
> >     at
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:392)
> >     at
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:358)
> >     at
> org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:387)
> >     at
> org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:1041)
> >     at
> org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1033)
> >     at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:172)
> >     at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:943)
> >     at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:896)
> >     at java.security.AccessController.doPrivileged(Native Method)
> >     at javax.security.auth.Subject.doAs(Subject.java:396)
> >     at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
> >     at
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:896)
> >     at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:870)
> >     at
> org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:435)
> >     at
> org.apache.hadoop.hive.ql.exec.ExecDriver.main(ExecDriver.java:677)
> >     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >     at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> >     at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >     at java.lang.reflect.Method.invoke(Method.java:597)
> >     at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
> >
> > Installation detail:
> >
> > vagrant@hadoop:/opt/events-warehouse$ hadoop version
> > Hadoop 2.0.0-cdh4.1.2
> >
> > vagrant@hadoop:/opt/events-warehouse$ ls /usr/lib/hive/lib/ | grep hive
> > hive-builtins-0.9.0-cdh4.1.2.jar
> > hive-cli-0.9.0-cdh4.1.2.jar
> > hive-common-0.9.0-cdh4.1.2.jar
> > hive-contrib-0.9.0-cdh4.1.2.jar
> > hive_contrib.jar
> > hive-exec-0.9.0-cdh4.1.2.jar
> > hive-hbase-handler-0.9.0-cdh4.1.2.jar
> > hive-hwi-0.9.0-cdh4.1.2.jar
> > hive-jdbc-0.9.0-cdh4.1.2.jar
> > hive-json-serde-0.2.jar
> > hive-metastore-0.9.0-cdh4.1.2.jar
> > hive-pdk-0.9.0-cdh4.1.2.jar
> > hive-serde-0.9.0-cdh4.1.2.jar
> > hive-service-0.9.0-cdh4.1.2.jar
> > hive-shims-0.9.0-cdh4.1.2.jar
> >
> >
> >
> > Guillaume Allain
> > Senior Development Engineer
> > t: +44 20 7117 0809
> > m:
> > blinkbox music - the easiest way to listen to the music you love, for
> free
> > www.blinkboxmusic.com
> >
> >
>
>
> *Guillaume Allain*
> Senior Development Engineer
> *t:* +44 20 7117 0809
> *m:*
> *blinkbox music - the easiest way to listen to the music you love, for
> free*
> www.blinkboxmusic.com
>
>

Mime
View raw message