hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Adam Kramer (JIRA)" <j...@apache.org>
Subject [jira] Created: (HIVE-492) Allow for output inspection in realtime; perhaps in log files, but somewhere?
Date Sun, 17 May 2009 21:35:45 GMT
Allow for output inspection in realtime; perhaps in log files, but somewhere?
-----------------------------------------------------------------------------

                 Key: HIVE-492
                 URL: https://issues.apache.org/jira/browse/HIVE-492
             Project: Hadoop Hive
          Issue Type: Wish
          Components: Logging
            Reporter: Adam Kramer


Many queries take a long time to complete, and then fail (either because the job fails or
because the output data is not what was desired).

This is almost always traceable to, of course, an error in a mapper or a reducer, which we
can check or verify via multiple methods, most often running the query piece-by-piece and
seeing where the "wrong" output is. This process is time-consuming and requires a decent amount
of load on the system (e.g., repeating big queries while trying to debug transformers/syntax).
This problem is a bigger deal when a single query uses multiple transforms and several mapreduce
steps.

One way to potentially reduce the amount of overhead in debugging would be to provide actual
output in some logging mechanism. Specifically, I mean to have EVERY mapper and/or reducer
write the first five lines of output to some user-readable file. This would allow a user to
see what each part of the system is doing, and to potentially locate, in ONE failed query
statement, where the user error is.

Of course, 5 lines * 20000 mappers * 300 reducers is a lot of overhead; making this user-configurable
and/or estimated beforehand (at least 5 lines from at least 5 mappers and at least 5 reducers)
would be fine, as would making these output logs auto-delete after some timeframe (a day,
perhaps).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message