hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Suresh Antony (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-176) structured log for obtaining query stats/info
Date Thu, 15 Jan 2009 18:16:59 GMT

    [ https://issues.apache.org/jira/browse/HIVE-176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12664204#action_12664204
] 

Suresh Antony commented on HIVE-176:
------------------------------------

    *  inferNumReducers(): instead of two calls to the hivehistory - can just make one call
at the end of the function when the numReducers has been set for sure. We could also set NUM_REDUCERS
to 0 when no reducer is specified (more informative imho).
    ---- made it single call after this function call
    * I still don't see why HAS_REDUCE_TASKS and NUM_REDUCE_TASKS are meaningful counters.
what is the use case?
    --- Removed both of these variables
    * In TestHiveHistory - please use setup() method or constructor to do initialization.
also a negative test case would be good (to check if negative job status is being captured
for example).
    --- moved this code to setUp()
    * HiveHistoryViewer - indentation is badly off. I think we are following a general convention
of '} else {' as well (and curly braces on same like as function/class declaration - viz 'void
init() {'.
    --- Re-formtted using eclipse formatter
    * JOB_STATUS and TASK_STATUS are both unused.
    * i couldn't understand this code block in parseHiveHistory:
      + if (!line.trim().endsWith("\"")){ + continue; + }
      can u explain.
    --- Format is key="value"... so the value line does not end with " means value has a newline

    * parseLine: confused that we have a reg ex group for the key - but are not using it ..
seems weird - if u had groups for both key and value u wouldn't need to split. alternately
u can rely on just the split.
    -- cut and pasted this code From JobHistory Parser
    * getHiveHistory - i don't think it's a good idea to initialize hivehistory object on
demand:
      a) u always need it
      b) it prints stuff to the console (log file location). if u want a deterministic location
for this log - we should just initialize hivehistory at session initialization so that the
log file location always comes at the beginning of the session (and not at some random point
when the code actually requires it)

    -- moved hiveHistory initialization to constructor of sessionSate
    * it would be good to have an example of the hive history file/format checked in somewhere
with a pointer to it from the documentation (either in README or wiki).
    --- Put short summary about the HistoryLog in internal wiki.
           http://www.intern.facebook.com/intern/wiki/index.php/HiveQueryLog
    * another easy and comprehensive test to add is in TestCliDriver. This is generated code
that fires a bunch of queries - we should be easily able to use HiveHistoryViewer to assert
that query status is successful for all queries in positive tests.
    --- Added hiveHistory Check TestCliDriver. For this to work QTestUtil. SessionState is
constructed in the constructor of QTestUtil. Not sure this is correct way or not
    -- Changed TestCliDriver.vm to check history File.

One thing i am concerned about overall is the use of the term 'job' for what is essentially
a hive query. I think this creates a lot of room for confusion - since in the hadoop ecosystem
job means hadoop job. (we have also overloaded the word task in Hive - which is unfortunate
- but almost too late now). If possible - i would really appreciate if we could replace 'job'
with 'query' whereever applicable. (s/startJob/startQuery/ for example).
     --- Changed all Job referces to Query

    -- should we create the history file always, history will be disabled by default and enbaled
setting a jobconf parameter. 'enable.job.history'




> structured log for obtaining query stats/info
> ---------------------------------------------
>
>                 Key: HIVE-176
>                 URL: https://issues.apache.org/jira/browse/HIVE-176
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Logging
>    Affects Versions: 0.2.0
>            Reporter: Joydeep Sen Sarma
>            Assignee: Suresh Antony
>             Fix For: 0.2.0
>
>         Attachments: patch_176.txt, patch_176.txt, patch_176.txt
>
>
> Josh <josh@besquared.net> wrote:
> When launching off hive queries using hive -e is there a way to get the job id so that
I can just queue them up and go check their statuses later? What's the general pattern for
queueing and monitoring without using the libraries directly?
> I'm gonna throw my vote in for a structured log format. Users could tail it and use whatever
queuing or monitoring they wish. It's also probably just a 30 minute project for someone already
familiar with the code. I suggest ^A seperated key=value pairs per log line.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message