hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joydeep Sen Sarma (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-30) Hive web interface
Date Wed, 19 Nov 2008 21:40:44 GMT

    [ https://issues.apache.org/jira/browse/HIVE-30?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12649171#action_12649171
] 

Joydeep Sen Sarma commented on HIVE-30:
---------------------------------------

i have a broader concern about how many servers we will end up having and what the server
represents. with the jdbc/hive-73 effort - seems like there's at least one more hive server.
if the server manages state - then it doesn't make sense that there is more than one. with
the hadoop analogy - there would seem to be one server (like the namenode) that would expose
a jsp interface (in addition to other interfaces like jdbc/odbc)

we should also have one server side to manage common abstractions like userids and such. for
example - we would find this patch unusable inside facebook since it does not set userids
for hive queries - and this breaks the way we manage hadoop compute resources (we have fair
sharing and compute quotas set up per userid) and hive tables (all tables will be created
with same userid).

at a very fundamental level - it's not clear to me what the  'SHOW PROCESSLIST' equivalent
even means for Hive. With namenode for example - we associate a set of data nodes. with jobtracker
- we associate a set of compute resources. Hive does not control (as clearly) any resources.
A Hive query brings together a (Hive) metadata server, a map-reduce instance, one or more
dfs instances (tables/databases can span hdfs instances) and the client side compute resources
required to run the query. A collection of hive queries (unlike a collection of mysql queries
to the same mysql server) may not have much in common and hence the show processlist abstraction
is not that meaningful (at least to me). 

that aside - comments on the patch itself - i am ok with the way configuration stuff is being
used (looks like we are using hiveconf for the most part - just not for the hwi stuff), but:
- we seem to be initializing HiveConf for each show table/database - but it seems that one
would need just one hiveconf per session and continue using that
- how are the logs going to be managed? logs for all sessions are going to the same server
side log file. we should figure out a way to have the session id prepended to the log entries
at least .. (for debugging)

> Hive web interface
> ------------------
>
>                 Key: HIVE-30
>                 URL: https://issues.apache.org/jira/browse/HIVE-30
>             Project: Hadoop Hive
>          Issue Type: Bug
>            Reporter: Jeff Hammerbacher
>            Assignee: Edward Capriolo
>            Priority: Minor
>         Attachments: HIVE-30.patch
>
>
> Hive needs a web interface. The initial checkin should have:
> * simple schema browsing
> * query submission
> * query history (similar to MySQL's SHOW PROCESSLIST)
> A suggested feature: the ability to have a query notify the user when it's completed.
> Edward Capriolo has expressed some interest in driving this process.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message