Mailing-List: contact issues-help@hive.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@hive.apache.org
Date: Wed, 7 Oct 2015 01:04:26 +0000 (UTC)
From: "Sergey Shelukhin (JIRA)" <jira@apache.org>
To: issues@hive.apache.org
Message-ID: <JIRA.12902872.1444179427000.42028.1444179866522@Atlassian.JIRA>
In-Reply-To: <JIRA.12902872.1444179427000@Atlassian.JIRA>
References: <JIRA.12902872.1444179427000@Atlassian.JIRA>
 <JIRA.12902872.1444179427394@arcas>
Subject: [jira] [Updated] (HIVE-12050) change ORC split generation to use a
 different model
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable


     [ https://issues.apache.org/jira/browse/HIVE-12050?page=3Dcom.atlassia=
n.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sergey Shelukhin updated HIVE-12050:
------------------------------------
    Description: With directory listing, ETL vs BI decision, local cache, m=
etastore cache, with PPD, file footers, and combination thereof (e.g. most =
splits are processed via metastore PPD but some files are not cached and we=
 need to make ETL vs BI decision), some of which are blocking and some not,=
 -I want to write ORC split generation in Erlang- strategies are no longer =
the best model to organize all the work. Some messaging or task queue based=
 model might be better where each worker that does a blocking operation (di=
r listing, file read, metastore call, etc.) generates a list of things, and=
 things are further processed by other workers until all things are splits =
and there are no more other things to process.  (was: With directory listin=
g, ETL vs BI decision, local cache, metastore cache, with PPD, file footers=
, and combination thereof (e.g. most splits are processed via metastore PPD=
 but some files are not cached and we need to make ETL vs BI decision), som=
e of which are blocking and some not, -I want to write ORC split generation=
 in Erlang- strategies are no longer the best model to organize all the wor=
k. Some messaging or task queue based model might be better where each work=
 item that is blocking (dir listing, file read, metastore call, etc.) gener=
ates list of things, and things are further processed until all things are =
splits and there are no more other things to process.)

> change ORC split generation to use a different model
> ----------------------------------------------------
>
>                 Key: HIVE-12050
>                 URL: https://issues.apache.org/jira/browse/HIVE-12050
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Sergey Shelukhin
>
> With directory listing, ETL vs BI decision, local cache, metastore cache,=
 with PPD, file footers, and combination thereof (e.g. most splits are proc=
essed via metastore PPD but some files are not cached and we need to make E=
TL vs BI decision), some of which are blocking and some not, -I want to wri=
te ORC split generation in Erlang- strategies are no longer the best model =
to organize all the work. Some messaging or task queue based model might be=
 better where each worker that does a blocking operation (dir listing, file=
 read, metastore call, etc.) generates a list of things, and things are fur=
ther processed by other workers until all things are splits and there are n=
o more other things to process.


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)