hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mich Talebzadeh <mich.talebza...@gmail.com>
Subject Re: Hive footprint
Date Thu, 21 Apr 2016 08:55:45 GMT
This simply does not work but we need to make Hive use external indexes.
This is a must

Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 20 April 2016 at 19:37, Mich Talebzadeh <mich.talebzadeh@gmail.com>
wrote:

> Hi,
>
> If I may, I would also like to see where the Hive optimizer shows that it
> is used with explain ... or other means. It will be interesting.
>
> HTH
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 20 April 2016 at 19:20, Marcin Tustin <mtustin@handybook.com> wrote:
>
>> Could you expand on this? This sounds like something that would be great
>> to know, and probably fold into the wiki.
>>
>> On Wed, Apr 20, 2016 at 11:57 AM, J├Ârn Franke <jornfranke@gmail.com>
>> wrote:
>>
>>> Hive has working indexes. However many people overlook that a block is
>>> usually much larger than in a relational database and thus do not use them
>>> right.
>>>
>>> On 19 Apr 2016, at 09:31, Mich Talebzadeh <mich.talebzadeh@gmail.com>
>>> wrote:
>>>
>>> The issue is that Hive has indexes (not index store) but they don't work
>>> so there we go. May be in later releases we can make use of these indexes
>>> for faster queries. Hive allows even bitmap indexes on Fact table but they
>>> are never used by COB.
>>>
>>> show indexes on sales;
>>>
>>>
>>> +-----------------------+-----------------------+-----------------------+------------------------------------------+-----------------------+----------+--+
>>> |       idx_name        |       tab_name        |       col_names
>>> |               idx_tab_name               |       idx_type        |
>>> comment  |
>>>
>>> +-----------------------+-----------------------+-----------------------+------------------------------------------+-----------------------+----------+--+
>>> | sales_cust_bix        | sales                 | cust_id
>>> | oraclehadoop__sales_sales_cust_bix__     | bitmap
>>> |          |
>>> | sales_channel_bix     | sales                 | channel_id
>>> | oraclehadoop__sales_sales_channel_bix__  | bitmap
>>> |          |
>>> | sales_prod_bix        | sales                 | prod_id
>>> | oraclehadoop__sales_sales_prod_bix__     | bitmap
>>> |          |
>>> | sales_promo_bix       | sales                 | promo_id
>>> | oraclehadoop__sales_sales_promo_bix__    | bitmap
>>> |          |
>>> | sales_time_bix        | sales                 | time_id
>>> | oraclehadoop__sales_sales_time_bix__     | bitmap
>>> |          |
>>>
>>> +-----------------------+-----------------------+-----------------------+------------------------------------------+-----------------------+----------+--+
>>>
>>>
>>>
>>>
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>>
>>> On 18 April 2016 at 23:51, Marcin Tustin <mtustin@handybook.com> wrote:
>>>
>>>> We use a hive with ORC setup now. Queries may take thousands of seconds
>>>> with joins, and potentially tens of seconds with selects on very large
>>>> tables.
>>>>
>>>> My understanding is that the goal of hbase is to provide much lower
>>>> latency for queries. Obviously, this comes at the cost of not being able
to
>>>> perform joins. I don't actually use hbase, so I hesitate to say more about
>>>> it.
>>>>
>>>> On Mon, Apr 18, 2016 at 6:48 PM, Mich Talebzadeh <
>>>> mich.talebzadeh@gmail.com> wrote:
>>>>
>>>>> Thanks Marcin.
>>>>>
>>>>> What is the definition of low latency here? Are you referring to the
>>>>> performance of SQL against HBase tables compared to Hive. As I understand
>>>>> HBase is a columnar database. Would it be possible to use Hive against
ORC
>>>>> to achieve the same?
>>>>>
>>>>> Dr Mich Talebzadeh
>>>>>
>>>>>
>>>>>
>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>
>>>>>
>>>>>
>>>>> http://talebzadehmich.wordpress.com
>>>>>
>>>>>
>>>>>
>>>>> On 18 April 2016 at 23:43, Marcin Tustin <mtustin@handybook.com>
>>>>> wrote:
>>>>>
>>>>>> HBase has a different use case - it's for low-latency querying of
big
>>>>>> tables. If you combined it with Hive, you might have something nice
for
>>>>>> certain queries, but I wouldn't think of them as direct competitors.
>>>>>>
>>>>>> On Mon, Apr 18, 2016 at 6:34 PM, Mich Talebzadeh <
>>>>>> mich.talebzadeh@gmail.com> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I notice that Impala is rarely mentioned these days.  I may be
>>>>>>> missing something. However, I gather it is coming to end now
as I don't
>>>>>>> recall many use cases for it (or customers asking for it). In
contrast,
>>>>>>> Hive has hold its ground with the new addition of Spark and Tez
as
>>>>>>> execution engines, support for ACID and ORC and new stuff in
Hive 2. In
>>>>>>> addition provided a good choice for its metastore it scales well.
>>>>>>>
>>>>>>> If Hive had the ability (organic) to have local variable and
stored
>>>>>>> procedure support then it would be top notch Data Warehouse.
Given its
>>>>>>> metastore, I don't see any technical reason why it cannot support
these
>>>>>>> constructs.
>>>>>>>
>>>>>>> I was recently asked to comment on migration from commercial
DWs to
>>>>>>> Big Data (primarily for TCO reason) and really could not recall
any better
>>>>>>> candidate than Hive. Is HBase a viable alternative? Obviously
whatever one
>>>>>>> decides there is still HDFS, a good engine for Hive (sounds like
many
>>>>>>> prefer TEZ although I am a Spark fan) and the ubiquitous YARN.
>>>>>>>
>>>>>>> Let me know your thoughts.
>>>>>>>
>>>>>>>
>>>>>>> Dr Mich Talebzadeh
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> http://talebzadehmich.wordpress.com
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> Want to work at Handy? Check out our culture deck and open roles
>>>>>> <http://www.handy.com/careers>
>>>>>> Latest news <http://www.handy.com/press> at Handy
>>>>>> Handy just raised $50m
>>>>>> <http://venturebeat.com/2015/11/02/on-demand-home-service-handy-raises-50m-in-round-led-by-fidelity/>
led
>>>>>> by Fidelity
>>>>>>
>>>>>>
>>>>>
>>>>
>>>> Want to work at Handy? Check out our culture deck and open roles
>>>> <http://www.handy.com/careers>
>>>> Latest news <http://www.handy.com/press> at Handy
>>>> Handy just raised $50m
>>>> <http://venturebeat.com/2015/11/02/on-demand-home-service-handy-raises-50m-in-round-led-by-fidelity/>
led
>>>> by Fidelity
>>>>
>>>>
>>>
>>
>> Want to work at Handy? Check out our culture deck and open roles
>> <http://www.handy.com/careers>
>> Latest news <http://www.handy.com/press> at Handy
>> Handy just raised $50m
>> <http://venturebeat.com/2015/11/02/on-demand-home-service-handy-raises-50m-in-round-led-by-fidelity/>
led
>> by Fidelity
>>
>>
>

Mime
View raw message