hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joydeep Sen Sarma (JIRA)" <>
Subject [jira] Commented: (HIVE-1408) add option to let hive automatically run in local mode based on tunable heuristics
Date Wed, 28 Jul 2010 07:20:20 GMT


Joydeep Sen Sarma commented on HIVE-1408:

#1 - we decide that i would try to take out ProxyFileSystem from the hive jars in the distribution.
unfortunately, i am unable to do so - all the simple ways seem to break the tests. i don't
see much of a downside with the current arrangement - ProxyFileSystem is test-only code -
there's no reason why anyone should invoke this. so shouldn't cause any problems (even though
it ships with the hive jars). the pfile:// -> ProxyFileSystem mapping exists only in test

  btw - i can't use ShimLoader - because Hadoop doesn't specify a factory class for creating
file system object. it expects a file system class directly. that makes it impossible to write
a portable filesystem class using the shimloader paradigm. i am beginning to appreciate factory
classes more.

#2 not an issue - can't use ShimLoader as per above.

#3 fixed

#4, #5, #6, #7, #8 - not an issue as we discussed. HIVE-1484 has already been filed as a followup
work to use local dir for intermediate data when possible

#9 - fixed. moved one public func to and eliminated the other.

> add option to let hive automatically run in local mode based on tunable heuristics
> ----------------------------------------------------------------------------------
>                 Key: HIVE-1408
>                 URL:
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>            Reporter: Joydeep Sen Sarma
>            Assignee: Joydeep Sen Sarma
>         Attachments: 1408.1.patch, 1408.2.patch, 1408.2.q.out.patch, 1408.7.patch, hive-1408.6.patch
> as a followup to HIVE-543 - we should have a simple option (enabled by default) to let
hive run in local mode if possible.
> two levels of options are desirable:
> 1. // control whether local mode is automatically
> 2. Options to control different heuristics, some naiive examples:
> // don't choose local mode if data >
> // choose if local mode is enabled
for queries with user scripts
> this can be implemented as a pre/post execution hook. It makes sense to provide this
as a standard hook in the hive codebase since it's likely to improve response time for many
users (especially for test queries).
> the initial proposal is to choose this at a query level and not at per hive-task (ie.
hadoop job) level. per job-level requires more changes to compilation (to not pre-commit to
hdfs or local scratch directories at compile time).

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message