hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sergio Peña (JIRA) <j...@apache.org>
Subject [jira] [Updated] (HIVE-11131) Get row information on DataWritableWriter once for better writing performance
Date Sat, 27 Jun 2015 02:50:04 GMT

     [ https://issues.apache.org/jira/browse/HIVE-11131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sergio Peña updated HIVE-11131:
-------------------------------
    Attachment:     (was: HIVE-11131.1.patch)

> Get row information on DataWritableWriter once for better writing performance
> -----------------------------------------------------------------------------
>
>                 Key: HIVE-11131
>                 URL: https://issues.apache.org/jira/browse/HIVE-11131
>             Project: Hive
>          Issue Type: Sub-task
>    Affects Versions: 1.2.0
>            Reporter: Sergio Peña
>            Assignee: Sergio Peña
>         Attachments: HIVE-11131.2.patch
>
>
> DataWritableWriter is a class used to write Hive records to Parquet files. This class
is getting all the information about how to parse a record, such as schema and object inspector,
every time a record is written (or write() is called).
> We can make this class perform better by initializing some writers per data
> type once, and saving all object inspectors on each writer.
> The class expects that the next records written will have the same object inspectors
and schema, so there is no need to have conditions for that. When a new schema is written,
DataWritableWriter is created again by Parquet. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message