hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hive QA (JIRA)" <>
Subject [jira] [Commented] (HIVE-10036) Writing ORC format big table causes OOM - too many fixed sized stream buffers
Date Sun, 10 May 2015 23:27:00 GMT


Hive QA commented on HIVE-10036:

{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 8921 tests executed
*Failed tests:*

Test results:
Console output:
Test logs:

Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed

This message is automatically generated.

ATTACHMENT ID: 12731823 - PreCommit-HIVE-TRUNK-Build

> Writing ORC format big table causes OOM - too many fixed sized stream buffers
> -----------------------------------------------------------------------------
>                 Key: HIVE-10036
>                 URL:
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Selina Zhang
>            Assignee: Selina Zhang
>              Labels: orcfile
>         Attachments: HIVE-10036.1.patch, HIVE-10036.2.patch, HIVE-10036.3.patch, HIVE-10036.5.patch,
HIVE-10036.6.patch, HIVE-10036.7.patch, HIVE-10036.8.patch, HIVE-10036.9.patch
> ORC writer keeps multiple out steams for each column. Each output stream is allocated
fixed size ByteBuffer (configurable, default to 256K). For a big table, the memory cost is
unbearable. Specially when HCatalog dynamic partition involves, several hundreds files may
be open and writing at the same time (same problems for FileSinkOperator). 
> Global ORC memory manager controls the buffer size, but it only got kicked in at 5000
rows interval. An enhancement could be done here, but the problem is reducing the buffer size
introduces worse compression and more IOs in read path. Sacrificing the read performance is
always not a good choice. 
> I changed the fixed size ByteBuffer to a dynamic growth buffer which up bound to the
existing configurable buffer size. Most of the streams does not need large buffer so the performance
got improved significantly. Comparing to Facebook's hive-dwrf, I monitored 2x performance
gain with this fix. 
> Solving OOM for ORC completely maybe needs lots of effort , but this is definitely a
low hanging fruit. 

This message was sent by Atlassian JIRA

View raw message