hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joydeep Sen Sarma (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-176) structured log for obtaining query stats/info
Date Mon, 12 Jan 2009 17:57:59 GMT

    [ https://issues.apache.org/jira/browse/HIVE-176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12663027#action_12663027

Joydeep Sen Sarma commented on HIVE-176:

- inferNumReducers(): instead of two calls to the hivehistory - can just make one call at
the end of the function when the numReducers has been set for sure. We could also set NUM_REDUCERS
to 0 when no reducer is specified (more informative imho).
- I still don't see why HAS_REDUCE_TASKS and NUM_REDUCE_TASKS are meaningful counters. what
is the use case?
- In TestHiveHistory - please use setup() method or constructor to do initialization. also
a negative test case would be good (to check if negative job status is being captured for
- HiveHistoryViewer - indentation is badly off. I think we are following a general convention
of '} else {' as well (and curly braces on same like as function/class declaration - viz 'void
init() {'.
- JOB_STATUS and TASK_STATUS are both unused.
- i couldn't understand this code block in parseHiveHistory: 
+       if (!line.trim().endsWith("\"")){
+         continue; 
+       }
   can u explain.
- parseLine: confused that we have a reg ex group for the key - but are not using it .. seems
weird - if u had groups for both key and value u wouldn't need to split. alternately u can
rely on just the split.
- getHiveHistory - i don't think it's a good idea to initialize hivehistory object on demand:
  a) u always need it
  b) it prints stuff to the console (log file location). if u want a deterministic location
for this log - we should just initialize hivehistory at session initialization so that the
log file location always comes at the beginning of the session (and not at some random point
when the code actually requires it)

- it would be good to have an example of the hive history file/format checked in somewhere
with a pointer to it from the documentation (either in README or wiki). 
- another easy and comprehensive test to add is in TestCliDriver. This is generated code that
fires a bunch of queries - we should be easily able to use HiveHistoryViewer to assert that
query status is successful for all queries in positive tests.

One thing i am concerned about overall is the use of the term 'job' for what is essentially
a hive query. I think this creates a lot of room for confusion - since in the hadoop ecosystem
job means hadoop job. (we have also overloaded the word task in Hive - which is unfortunate
- but almost too late now). If possible - i would really appreciate if we could replace 'job'
with 'query' whereever applicable. (s/startJob/startQuery/ for example).

> structured log for obtaining query stats/info
> ---------------------------------------------
>                 Key: HIVE-176
>                 URL: https://issues.apache.org/jira/browse/HIVE-176
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Logging
>    Affects Versions: 0.2.0
>            Reporter: Joydeep Sen Sarma
>            Assignee: Suresh Antony
>             Fix For: 0.2.0
>         Attachments: patch_176.txt, patch_176.txt, patch_176.txt
> Josh <josh@besquared.net> wrote:
> When launching off hive queries using hive -e is there a way to get the job id so that
I can just queue them up and go check their statuses later? What's the general pattern for
queueing and monitoring without using the libraries directly?
> I'm gonna throw my vote in for a structured log format. Users could tail it and use whatever
queuing or monitoring they wish. It's also probably just a 30 minute project for someone already
familiar with the code. I suggest ^A seperated key=value pairs per log line.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message