hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jörn Franke <jornfra...@gmail.com>
Subject Re: Hive footprint
Date Wed, 20 Apr 2016 15:57:45 GMT
Hive has working indexes. However many people overlook that a block is usually much larger
than in a relational database and thus do not use them right.

> On 19 Apr 2016, at 09:31, Mich Talebzadeh <mich.talebzadeh@gmail.com> wrote:
> 
> The issue is that Hive has indexes (not index store) but they don't work so there we
go. May be in later releases we can make use of these indexes for faster queries. Hive allows
even bitmap indexes on Fact table but they are never used by COB.
> 
> show indexes on sales;
> 
> +-----------------------+-----------------------+-----------------------+------------------------------------------+-----------------------+----------+--+
> |       idx_name        |       tab_name        |       col_names       |           
   idx_tab_name               |       idx_type        | comment  |
> +-----------------------+-----------------------+-----------------------+------------------------------------------+-----------------------+----------+--+
> | sales_cust_bix        | sales                 | cust_id               | oraclehadoop__sales_sales_cust_bix__
    | bitmap                |          |
> | sales_channel_bix     | sales                 | channel_id            | oraclehadoop__sales_sales_channel_bix__
 | bitmap                |          |
> | sales_prod_bix        | sales                 | prod_id               | oraclehadoop__sales_sales_prod_bix__
    | bitmap                |          |
> | sales_promo_bix       | sales                 | promo_id              | oraclehadoop__sales_sales_promo_bix__
   | bitmap                |          |
> | sales_time_bix        | sales                 | time_id               | oraclehadoop__sales_sales_time_bix__
    | bitmap                |          |
> +-----------------------+-----------------------+-----------------------+------------------------------------------+-----------------------+----------+--+
> 
> 
> 
> 
> 
> Dr Mich Talebzadeh
>  
> LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>  
> http://talebzadehmich.wordpress.com
>  
> 
>> On 18 April 2016 at 23:51, Marcin Tustin <mtustin@handybook.com> wrote:
>> We use a hive with ORC setup now. Queries may take thousands of seconds with joins,
and potentially tens of seconds with selects on very large tables. 
>> 
>> My understanding is that the goal of hbase is to provide much lower latency for queries.
Obviously, this comes at the cost of not being able to perform joins. I don't actually use
hbase, so I hesitate to say more about it. 
>> 
>>> On Mon, Apr 18, 2016 at 6:48 PM, Mich Talebzadeh <mich.talebzadeh@gmail.com>
wrote:
>>> Thanks Marcin.
>>> 
>>> What is the definition of low latency here? Are you referring to the performance
of SQL against HBase tables compared to Hive. As I understand HBase is a columnar database.
Would it be possible to use Hive against ORC to achieve the same?
>>> 
>>> Dr Mich Talebzadeh
>>>  
>>> LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>  
>>> http://talebzadehmich.wordpress.com
>>>  
>>> 
>>>> On 18 April 2016 at 23:43, Marcin Tustin <mtustin@handybook.com> wrote:
>>>> HBase has a different use case - it's for low-latency querying of big tables.
If you combined it with Hive, you might have something nice for certain queries, but I wouldn't
think of them as direct competitors.
>>>> 
>>>>> On Mon, Apr 18, 2016 at 6:34 PM, Mich Talebzadeh <mich.talebzadeh@gmail.com>
wrote:
>>>>> Hi,
>>>>> 
>>>>> I notice that Impala is rarely mentioned these days.  I may be missing
something. However, I gather it is coming to end now as I don't recall many use cases for
it (or customers asking for it). In contrast, Hive has hold its ground with the new addition
of Spark and Tez as execution engines, support for ACID and ORC and new stuff in Hive 2. In
addition provided a good choice for its metastore it scales well.
>>>>> 
>>>>> If Hive had the ability (organic) to have local variable and stored procedure
support then it would be top notch Data Warehouse. Given its metastore, I don't see any technical
reason why it cannot support these constructs.
>>>>> 
>>>>> I was recently asked to comment on migration from commercial DWs to Big
Data (primarily for TCO reason) and really could not recall any better candidate than Hive.
Is HBase a viable alternative? Obviously whatever one decides there is still HDFS, a good
engine for Hive (sounds like many prefer TEZ although I am a Spark fan) and the ubiquitous
YARN.
>>>>> 
>>>>> Let me know your thoughts.
>>>>> 
>>>>> 
>>>>> Dr Mich Talebzadeh
>>>>>  
>>>>> LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>  
>>>>> http://talebzadehmich.wordpress.com
>>>> 
>>>> 
>>>> Want to work at Handy? Check out our culture deck and open roles
>>>> Latest news at Handy
>>>> Handy just raised $50m led by Fidelity
>>>> 
>>>> 
>> 
>> 
>> Want to work at Handy? Check out our culture deck and open roles
>> Latest news at Handy
>> Handy just raised $50m led by Fidelity
>> 
>> 
> 

Mime
View raw message