hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Prasanth Jayachandran (JIRA)" <j...@apache.org>
Subject [jira] [Assigned] (HIVE-21458) ACID: Optimize AcidUtils$MetaDataFile.isRawFormat
Date Sat, 16 Mar 2019 01:15:00 GMT

     [ https://issues.apache.org/jira/browse/HIVE-21458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Prasanth Jayachandran reassigned HIVE-21458:
--------------------------------------------

    Assignee:     (was: Prasanth Jayachandran)

> ACID: Optimize AcidUtils$MetaDataFile.isRawFormat 
> --------------------------------------------------
>
>                 Key: HIVE-21458
>                 URL: https://issues.apache.org/jira/browse/HIVE-21458
>             Project: Hive
>          Issue Type: Bug
>          Components: Transactions
>    Affects Versions: 3.1.1
>            Reporter: Vaibhav Gumashta
>            Priority: Major
>         Attachments: async-prof-pid-1-cpu-1.svg
>
>
> In the transactional subsystems, in several places we check to see if a data file has
ROW__ID fields or not. Every time we do that (even within the context of the same query),
we open a Reader for that file/split. We could optimize this by caching or perhaps checking
once, and saving our result for later. Also, perhaps we don't need to do this for every split.
An example call stack:
> {code}
> OrcFile.createReader(Path, OrcFile$ReaderOptions) line: 105	
> AcidUtils$MetaDataFile.isRawFormatFile(Path, FileSystem) line: 2026	
> AcidUtils$MetaDataFile.isRawFormat(Path, FileSystem) line: 2022	
> AcidUtils.parsedDelta(Path, String, FileSystem) line: 1007	
> OrcRawRecordMerger$TransactionMetaData.findWriteIDForSynthetcRowIDs(Path, Path, Configuration)
line: 1231	
> OrcRawRecordMerger.discoverOriginalKeyBounds(Reader, int, Reader$Options, Configuration,
OrcRawRecordMerger$Options) line: 722	
> OrcRawRecordMerger.<init>(Configuration, boolean, Reader, boolean, int, ValidWriteIdList,
Reader$Options, Path[], OrcRawRecordMerger$Options) line: 1022	
> OrcInputFormat.getReader(InputSplit, Options) line: 2108	
> OrcInputFormat.getRecordReader(InputSplit, JobConf, Reporter) line: 2006	
> FetchOperator$FetchInputFormatSplit.getRecordReader(JobConf) line: 776	
> FetchOperator.getRecordReader() line: 344	
> FetchOperator.getNextRow() line: 540	
> FetchOperator.pushRow() line: 509	
> FetchTask.fetch(List) line: 146	
> {code} 
> Here, for each split we'll make that check.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message