hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mostafa Mokhtar (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HIVE-7741) Don't synchronize WriterImpl.addRow() when dynamic.partition is enabled
Date Fri, 15 Aug 2014 19:01:19 GMT
Mostafa Mokhtar created HIVE-7741:
-------------------------------------

             Summary: Don't synchronize WriterImpl.addRow() when dynamic.partition is enabled
                 Key: HIVE-7741
                 URL: https://issues.apache.org/jira/browse/HIVE-7741
             Project: Hive
          Issue Type: Bug
          Components: File Formats
    Affects Versions: 0.13.1
         Environment: Loading into orc
            Reporter: Mostafa Mokhtar
            Assignee: Prasanth J
             Fix For: 0.14.0


When loading into an un-paritioned ORC table WriterImpl$StructTreeWriter.write method is synchronized.

When hive.optimize.sort.dynamic.partition is enabled the current thread will be the only writer
and the synchronization is not needed.

Also  checking for memory per row is an over kill , this can be done per 1K rows or such

{code}
  public void addRow(Object row) throws IOException {
    synchronized (this) {
      treeWriter.write(row);
      rowsInStripe += 1;
      if (buildIndex) {
        rowsInIndex += 1;

        if (rowsInIndex >= rowIndexStride) {
          createRowIndexEntry();
        }
      }
    }
    memoryManager.addedRow();
  }
{code}

This can improve ORC load performance by 7% 

{code}
Stack Trace	Sample Count	Percentage(%)
WriterImpl.addRow(Object)	5,852	65.782
   WriterImpl$StructTreeWriter.write(Object)	5,163	58.037
   MemoryManager.addedRow()	666	7.487
      MemoryManager.notifyWriters()	648	7.284
         WriterImpl.checkMemory(double)	645	7.25
            WriterImpl.flushStripe()	643	7.228
               WriterImpl$StructTreeWriter.writeStripe(OrcProto$StripeFooter$Builder, int)
584	6.565
{code}







--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message