carbondata-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jihong MA (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CARBONDATA-429) Eliminate unnecessary file name check in dictionary cache
Date Fri, 16 Dec 2016 01:43:58 GMT

     [ https://issues.apache.org/jira/browse/CARBONDATA-429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jihong MA updated CARBONDATA-429:
---------------------------------
    Description: 
1.there are currently many file name check for each column's dictionary cache, which cause
unnecessary calls to HDFS getFileStatus.
2.  in checkAndLoadDictionaryData, we get meta file's mtime from hdfs each time we invoke
cache.get to check if the local cache is valid or not.  The local dictionary cache may be
invalid after parallel data load.  This will in turn increase number of calls to getFileStatus
as well.

  was:
1. In dictionary cache, there are currently many unnecessary file name check for each column,
which increase the number of calling  HDFS getFileStatus.
2. And in checkAndLoadDictionaryData, we get meta file's mtime from hdfs each time we call
cache.get to check if the local is valid or not.  The local dictionary cache may be invalid
after another job finished load data.  This will still increases calling getFileStatus

        Summary: Eliminate unnecessary file name check in dictionary cache  (was: Remove unnecessary
file name check in dictionary cache)

> Eliminate unnecessary file name check in dictionary cache
> ---------------------------------------------------------
>
>                 Key: CARBONDATA-429
>                 URL: https://issues.apache.org/jira/browse/CARBONDATA-429
>             Project: CarbonData
>          Issue Type: Sub-task
>          Components: core
>    Affects Versions: 0.1.1-incubating
>            Reporter: Jacky Li
>            Assignee: Ashok Kumar
>             Fix For: 1.0.0-incubating
>
>          Time Spent: 5h
>  Remaining Estimate: 0h
>
> 1.there are currently many file name check for each column's dictionary cache, which
cause unnecessary calls to HDFS getFileStatus.
> 2.  in checkAndLoadDictionaryData, we get meta file's mtime from hdfs each time we invoke
cache.get to check if the local cache is valid or not.  The local dictionary cache may be
invalid after parallel data load.  This will in turn increase number of calls to getFileStatus
as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message