hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zheng Shao (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-457) ScriptOperator should NOT cache all data in stderr
Date Mon, 21 Dec 2009 08:54:18 GMT

    [ https://issues.apache.org/jira/browse/HIVE-457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12793122#action_12793122

Zheng Shao commented on HIVE-457:

I don't think we can easily trim off long records in binary streams. It may not be possible
sometimes, depending on the format.
Text format is special because we just need to find the next newline.

Let's get this in first. We can add that to TypedBytesRecordReader if such a need comes up

> ScriptOperator should NOT cache all data in stderr
> --------------------------------------------------
>                 Key: HIVE-457
>                 URL: https://issues.apache.org/jira/browse/HIVE-457
>             Project: Hadoop Hive
>          Issue Type: Bug
>            Reporter: Zheng Shao
>            Assignee: Paul Yang
>            Priority: Blocker
>             Fix For: 0.5.0
>         Attachments: err.sh, HIVE-457.1.patch, HIVE-457.2.patch
> Sometimes user scripts output a lot of data to stderr without a new line, and this causes
Hive to go out-of-memory.
> We should directly output the data from stderr without caching it.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message