hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ashish Thusoo (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HIVE-410) Heartbeating for streaming jobs should not depend on stdout
Date Wed, 20 May 2009 21:42:45 GMT

     [ https://issues.apache.org/jira/browse/HIVE-410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Ashish Thusoo updated HIVE-410:

    Attachment: patch-410-2.txt

Added code to parameterize this based on the expiry interval in map reduce.

I had to bump the memory for junit as our tests intermittently fail with out of memory exception
otherwise. Looks like we are operating near the 256m limit.

> Heartbeating for streaming jobs should not depend on stdout
> -----------------------------------------------------------
>                 Key: HIVE-410
>                 URL: https://issues.apache.org/jira/browse/HIVE-410
>             Project: Hadoop Hive
>          Issue Type: Bug
>            Reporter: Venky Iyer
>            Assignee: Ashish Thusoo
>            Priority: Blocker
>         Attachments: patch-410-2.txt, patch-410.txt
> jobs that require iterative processing may take longer than 10 mins to produce rows.
This shouldn't be cause to kill the job. Producing keepalive dummy rows to stdout is bad if
the data has to go into a Hive table or other Hive steps.
> If we adopt the solution of using stderr to indicate heartbeats, can that be combined
with streaming counters (http://hadoop.apache.org/core/docs/current/streaming.html#How+do+I+update+counters+in+streaming+applications%3F
)? Also, will limitations on size of stderr break this?

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message