hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mich Talebzadeh" <m...@peridale.co.uk>
Subject RE: Indexes in Hive
Date Wed, 06 Jan 2016 07:51:31 GMT
I believe so Jorn. 

I am not sure how much it differs from ORC file storage?

Cheers,

Dr Mich Talebzadeh

LinkedIn
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUr
V8Pw

Sybase ASE 15 Gold Medal Award 2008
A Winning Strategy: Running the most Critical Financial Data on ASE 15
http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.
pdf
Author of the books "A Practitioner’s Guide to Upgrading to Sybase ASE 15",
ISBN 978-0-9563693-0-7. 
co-author "Sybase Transact SQL Guidelines Best Practices", ISBN
978-0-9759693-0-4
Publications due shortly:
Complex Event Processing in Heterogeneous Environments, ISBN:
978-0-9563693-3-8
Oracle and Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, volume
one out shortly

http://talebzadehmich.wordpress.com

NOTE: The information in this email is proprietary and confidential. This
message is for the designated recipient only, if you are not the intended
recipient, you should destroy it immediately. Any information in this
message shall not be understood as given or endorsed by Peridale Technology
Ltd, its subsidiaries or their employees, unless expressly so stated. It is
the responsibility of the recipient to ensure that this email is virus free,
therefore neither Peridale Ltd, its subsidiaries nor their employees accept
any responsibility.


-----Original Message-----
From: Jörn Franke [mailto:jornfranke@gmail.com] 
Sent: 06 January 2016 07:49
To: user@hive.apache.org
Subject: Re: Indexes in Hive

If I understand you correctly this could be just another Hive storage
format.

> On 06 Jan 2016, at 07:24, Mich Talebzadeh <mich@peridale.co.uk> wrote:
> 
> Hi,
> 
> Thinking loudly.
> 
> Ideally we should consider a totally columnar storage offering in 
> which each column of table is stored as compressed value (I disregard 
> for now how actually ORC does this but obviously it is not exactly a
columnar storage).
> 
> So each table can be considered as a loose federation of columnar 
> storage and each column is effectively an index?
> 
> As columns are far narrower than tables, each index block will be very 
> higher density and all operations like aggregates can be done directly 
> on index rather than table.
> 
> This type of table offering will be in true nature of data warehouse 
> storage. Of course row operations (get me all rows for this table) 
> will be slower but that is the trade-off that we need to consider.
> 
> Expecting users to write their own IndexHandler may be technically 
> interesting but commercially not viable as Hive needs to be a product 
> on its own merit not a development base. Writing your own storage
attributes etc.
> requires skills that will put off people seeing Hive as an attractive 
> proposition (requiring considerable investment in skill sets in order 
> to maintain Hive).
> 
> Thus my thinking on this is to offer true columnar storage in Hive to 
> be a proper data warehouse. In addition, the development tools cab ne 
> made available for those interested in tailoring their own specific 
> Hive solutions.
> 
> 
> HTH
> 
> 
> 
> Dr Mich Talebzadeh
> 
> LinkedIn
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCC
> dOABUr
> V8Pw
> 
> Sybase ASE 15 Gold Medal Award 2008
> A Winning Strategy: Running the most Critical Financial Data on ASE 15 
>
http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.
> pdf
> Author of the books "A Practitioner's Guide to Upgrading to Sybase ASE 
> 15", ISBN 978-0-9563693-0-7.
> co-author "Sybase Transact SQL Guidelines Best Practices", ISBN
> 978-0-9759693-0-4
> Publications due shortly:
> Complex Event Processing in Heterogeneous Environments, ISBN:
> 978-0-9563693-3-8
> Oracle and Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, 
> volume one out shortly
> 
> http://talebzadehmich.wordpress.com
> 
> NOTE: The information in this email is proprietary and confidential. 
> This message is for the designated recipient only, if you are not the 
> intended recipient, you should destroy it immediately. Any information 
> in this message shall not be understood as given or endorsed by 
> Peridale Technology Ltd, its subsidiaries or their employees, unless 
> expressly so stated. It is the responsibility of the recipient to 
> ensure that this email is virus free, therefore neither Peridale Ltd, 
> its subsidiaries nor their employees accept any responsibility.
> 
> 
> -----Original Message-----
> From: Gopal Vijayaraghavan [mailto:gopal@hortonworks.com] On Behalf Of 
> Gopal Vijayaraghavan
> Sent: 05 January 2016 23:55
> To: user@hive.apache.org
> Subject: Re: Is Hive Index officially not recommended?
> 
> 
>> So in a nutshell in Hive if "external" indexes are not used for 
>> improving query response, what value they add and can we forget them 
>> for
> now?
> 
> The builtin indexes - those that write data as smaller tables are only 
> useful in a pre-columnar world, where the indexes offer a huge 
> reduction in IO.
> 
> Part #1 of using hive indexes effectively is to write your own 
> HiveIndexHandler, with usesIndexTable=false;
> 
> And then write a IndexPredicateAnalyzer, which lets you map arbitrary 
> lookups into other range conditions.
> 
> Not coincidentally - we're adding a "ANALYZE TABLE ... CACHE METADATA"
> which consolidates the "internal" index into an external store (HBase).
> 
> Some of the index data now lives in the HBase metastore, so that the 
> inclusion/exclusion of whole partitions can be done off the 
> consolidated index.
> 
> https://issues.apache.org/jira/browse/HIVE-11676
> 
> 
> The experience from BI workloads run by customers is that in general, 
> the lookup to the right "slice" of data is more of a problem than the 
> actual aggregate.
> 
> And that for a workhorse data warehouse, this has to survive even if 
> there's a non-stop stream of updates into it.
> 
> Cheers,
> Gopal
> 


Mime
View raw message