hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zheng Shao (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-457) ScriptOperator should NOT cache all data in stderr
Date Sat, 19 Dec 2009 01:20:18 GMT

    [ https://issues.apache.org/jira/browse/HIVE-457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12792744#action_12792744
] 

Zheng Shao commented on HIVE-457:
---------------------------------

This also means that we are limiting the row length for the data.

Can we add a conf variable into TextRecord and set the default to maybe 10MB?
In this way, nobody will notice the difference unless the row is bigger than 10MB, which is
rare enough but is still smaller than typical memory size.

{code}
set hive.text.record.reader.max.length=10485760;
{code}

We can treat this one as an internal variable (by not adding it to hive-default.xml), following
hadoop internal variable convention.

> ScriptOperator should NOT cache all data in stderr
> --------------------------------------------------
>
>                 Key: HIVE-457
>                 URL: https://issues.apache.org/jira/browse/HIVE-457
>             Project: Hadoop Hive
>          Issue Type: Bug
>            Reporter: Zheng Shao
>            Assignee: Paul Yang
>            Priority: Blocker
>             Fix For: 0.5.0
>
>         Attachments: err.sh, HIVE-457.1.patch
>
>
> Sometimes user scripts output a lot of data to stderr without a new line, and this causes
Hive to go out-of-memory.
> We should directly output the data from stderr without caching it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message