hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sergey Shelukhin (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-12050) change ORC split generation to use a different model
Date Wed, 07 Oct 2015 01:04:26 GMT

     [ https://issues.apache.org/jira/browse/HIVE-12050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sergey Shelukhin updated HIVE-12050:
------------------------------------
    Description: With directory listing, ETL vs BI decision, local cache, metastore cache,
with PPD, file footers, and combination thereof (e.g. most splits are processed via metastore
PPD but some files are not cached and we need to make ETL vs BI decision), some of which are
blocking and some not, -I want to write ORC split generation in Erlang- strategies are no
longer the best model to organize all the work. Some messaging or task queue based model might
be better where each worker that does a blocking operation (dir listing, file read, metastore
call, etc.) generates a list of things, and things are further processed by other workers
until all things are splits and there are no more other things to process.  (was: With directory
listing, ETL vs BI decision, local cache, metastore cache, with PPD, file footers, and combination
thereof (e.g. most splits are processed via metastore PPD but some files are not cached and
we need to make ETL vs BI decision), some of which are blocking and some not, -I want to write
ORC split generation in Erlang- strategies are no longer the best model to organize all the
work. Some messaging or task queue based model might be better where each work item that is
blocking (dir listing, file read, metastore call, etc.) generates list of things, and things
are further processed until all things are splits and there are no more other things to process.)

> change ORC split generation to use a different model
> ----------------------------------------------------
>
>                 Key: HIVE-12050
>                 URL: https://issues.apache.org/jira/browse/HIVE-12050
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Sergey Shelukhin
>
> With directory listing, ETL vs BI decision, local cache, metastore cache, with PPD, file
footers, and combination thereof (e.g. most splits are processed via metastore PPD but some
files are not cached and we need to make ETL vs BI decision), some of which are blocking and
some not, -I want to write ORC split generation in Erlang- strategies are no longer the best
model to organize all the work. Some messaging or task queue based model might be better where
each worker that does a blocking operation (dir listing, file read, metastore call, etc.)
generates a list of things, and things are further processed by other workers until all things
are splits and there are no more other things to process.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message