Return-Path: X-Original-To: apmail-hive-issues-archive@minotaur.apache.org Delivered-To: apmail-hive-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A78B61836E for ; Wed, 7 Oct 2015 01:04:26 +0000 (UTC) Received: (qmail 35162 invoked by uid 500); 7 Oct 2015 01:04:26 -0000 Delivered-To: apmail-hive-issues-archive@hive.apache.org Received: (qmail 35129 invoked by uid 500); 7 Oct 2015 01:04:26 -0000 Mailing-List: contact issues-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list issues@hive.apache.org Received: (qmail 35118 invoked by uid 99); 7 Oct 2015 01:04:26 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 07 Oct 2015 01:04:26 +0000 Date: Wed, 7 Oct 2015 01:04:26 +0000 (UTC) From: "Sergey Shelukhin (JIRA)" To: issues@hive.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HIVE-12050) change ORC split generation to use a different model MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HIVE-12050?page=3Dcom.atlassia= n.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-12050: ------------------------------------ Description: With directory listing, ETL vs BI decision, local cache, m= etastore cache, with PPD, file footers, and combination thereof (e.g. most = splits are processed via metastore PPD but some files are not cached and we= need to make ETL vs BI decision), some of which are blocking and some not,= -I want to write ORC split generation in Erlang- strategies are no longer = the best model to organize all the work. Some messaging or task queue based= model might be better where each worker that does a blocking operation (di= r listing, file read, metastore call, etc.) generates a list of things, and= things are further processed by other workers until all things are splits = and there are no more other things to process. (was: With directory listin= g, ETL vs BI decision, local cache, metastore cache, with PPD, file footers= , and combination thereof (e.g. most splits are processed via metastore PPD= but some files are not cached and we need to make ETL vs BI decision), som= e of which are blocking and some not, -I want to write ORC split generation= in Erlang- strategies are no longer the best model to organize all the wor= k. Some messaging or task queue based model might be better where each work= item that is blocking (dir listing, file read, metastore call, etc.) gener= ates list of things, and things are further processed until all things are = splits and there are no more other things to process.) > change ORC split generation to use a different model > ---------------------------------------------------- > > Key: HIVE-12050 > URL: https://issues.apache.org/jira/browse/HIVE-12050 > Project: Hive > Issue Type: Bug > Reporter: Sergey Shelukhin > > With directory listing, ETL vs BI decision, local cache, metastore cache,= with PPD, file footers, and combination thereof (e.g. most splits are proc= essed via metastore PPD but some files are not cached and we need to make E= TL vs BI decision), some of which are blocking and some not, -I want to wri= te ORC split generation in Erlang- strategies are no longer the best model = to organize all the work. Some messaging or task queue based model might be= better where each worker that does a blocking operation (dir listing, file= read, metastore call, etc.) generates a list of things, and things are fur= ther processed by other workers until all things are splits and there are n= o more other things to process. -- This message was sent by Atlassian JIRA (v6.3.4#6332)