hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ning Zhang (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-1408) add option to let hive automatically run in local mode based on tunable heuristics
Date Tue, 27 Jul 2010 22:57:18 GMT

    [ https://issues.apache.org/jira/browse/HIVE-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12892974#action_12892974

Ning Zhang commented on HIVE-1408:

Some questions:

  1) the local file system handled in shims are in a way that they are with the same file
name (class name) and are compiled conditionally depending on the hadoop version during compile
time. This may cause problem when deploying the same hive jar file to be used in different
clusters with different version. The current shim was implemented by naming the classes differently
and use ShimsLoader to get the correct class during execution time. This allows hive jar files
to be deployed to different hadoop clusters. 

  2) data/conf/hive-site.xml fs.pfile.impl is not needed if ShimsLoader is used as described

  3) the hive.exec.mode.local.auto default values are different in HiveConf.java and conf/hive-default.xml.
It's better to be the same to avoid confusion. 

  4) ctas.q.out: do you know why the GlobalTableID was changed?

  5) MapRedTask.java:149 The plan file name is not randomized as before. It may cause problem
when the parallel execution mode is true and multiple MapRedTasks are running at the same
time (e.g., parallel muti-table inserts). 

  6) If there are 2 MapRed tasks and MR2 depends on MR1 and MR1 is decided to be running local,
it seems MR2 have to be local since the intermediate files are stored in local file system?
What about in parallel execution when MR1 and MR2 running in parallel and only one of them
is local? It seems the info of whether a task is "local" is stored in Context (and HiveConf)
which is shared among parallel MR tasks?

  7) ExecDriver.localizeMRTmpFileImpl changes the FileSinkDesc.dirName after the MR tasks
have generated, it breaks the dynamic partition code which runs when the FileSinkOperator
is generated. In particular, the DynamicPartitionCtx also stores the dirName, it has to be
changed as well in localizeMRTmpFileImpl.

  8) MoveTask previously move intermediate directory in HDFS to the final directory also in
HDFS. In the local mode, we should change the MoveTask execution as well?

  9) Driver.java:100 the two functions are made static. Should they be moved to Utilities?

> add option to let hive automatically run in local mode based on tunable heuristics
> ----------------------------------------------------------------------------------
>                 Key: HIVE-1408
>                 URL: https://issues.apache.org/jira/browse/HIVE-1408
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>            Reporter: Joydeep Sen Sarma
>            Assignee: Joydeep Sen Sarma
>         Attachments: 1408.1.patch, 1408.2.patch, 1408.2.q.out.patch, hive-1408.6.patch
> as a followup to HIVE-543 - we should have a simple option (enabled by default) to let
hive run in local mode if possible.
> two levels of options are desirable:
> 1. hive.exec.mode.local.auto=true/false // control whether local mode is automatically
> 2. Options to control different heuristics, some naiive examples:
>      hive.exec.mode.local.auto.input.size.max=1G // don't choose local mode if data >
>      hive.exec.mode.local.auto.script.enable=true/false // choose if local mode is enabled
for queries with user scripts
> this can be implemented as a pre/post execution hook. It makes sense to provide this
as a standard hook in the hive codebase since it's likely to improve response time for many
users (especially for test queries).
> the initial proposal is to choose this at a query level and not at per hive-task (ie.
hadoop job) level. per job-level requires more changes to compilation (to not pre-commit to
hdfs or local scratch directories at compile time).

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message