kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mauricio Aristizabal <mauri...@impactradius.com>
Subject Re: new Kudu benchmarks
Date Fri, 05 Jan 2018 23:28:12 GMT
Thanks very much Todd, perfectly clear on both counts.

Yeah, as a convention we will only be exposing views to
analysts/report-writers/bi-tools (for several reasons), so having as long
in underlying tables will only be a concern of pipeline developers.

-m

On Fri, Jan 5, 2018 at 3:23 PM, Todd Lipcon <todd@cloudera.com> wrote:

> Hey Mauricio,
>
> Answers inline below
>
> On Fri, Jan 5, 2018 at 2:50 PM, Mauricio Aristizabal <
> mauricio@impactradius.com> wrote:
>
>> Todd, since you bring it up in this thread... what CDH version do you
>> expect DECIMAL support to make it into? I recently asked Icaro Vazquez
>> about it but still no news.  We're hoping it makes it into 5.14 otherwise
>> according to the roadmap there might not be another minor release and we'd
>> be waiting till Summer for CDH 6.
>>
>
> As this is an open source project mailing list, it would be inappropriate
> for me to comment on a vendor's release schedule. Please note that Kudu is
> a product of the Apache Software Foundation and the ASF doesn't have any
> influence on or knowledge of Cloudera's release plans.
>
> Of course it happens that I and many other contributors are also employees
> of Cloudera, but we participate in the ASF as individuals and not
> representatives of our employer, and so generally won't comment on
> questions like this in this forum. Please refer to Cloudera's forums for
> questions about CDH release plans, etc.
>
>
>>
>> And just in case we're forced to make do without DECIMAL initially, is
>> the recommendation really to store as string and convert?  I was thinking
>> of storing as int/long and dividing by 10 or 1000 as needed in an impala
>> view over the kudu table.  Wouldn't a division be way more performant than
>> a conversion from string, especially when aggregating over thousands of
>> records in a report query?
>>
>
> You're right -- using an integer type and division by a power of 10 is
> going to be much faster than casting from a string.  Division by a constant
> would be JITted by Impala into a pretty minimal sequence of assembly
> instructions (two bitshifts, an integer multiplication, and a subtraction)
> which likely take about 6 cycles total. In contrast, a cast from string to
> decimal probably takes many thousands of cycles.
>
> The only downside is that if you have end users using the data they might
> be confused by the integer representation whereas a string representation
> would be a little clearer.
>
> Thanks
> -Todd
>
>
>>
>> On Fri, Jan 5, 2018 at 11:13 AM, Todd Lipcon <todd@cloudera.com> wrote:
>>
>>> Oh, one other piece of feedback: maybe worth editing the title to say
>>> "vs Apache Parquet" instead of "vs Apache Impala" since in all cases you
>>> are using Impala as the query engine?
>>>
>>> -Todd
>>>
>>> On Fri, Jan 5, 2018 at 11:06 AM, Todd Lipcon <todd@cloudera.com> wrote:
>>>
>>>> Hey Boris,
>>>>
>>>> Thanks for publishing this. It's a great look at how an end user
>>>> evaluates Kudu. I appreciate that you cover both the pros and cons of the
>>>> technology, and glad to see that your conclusion leaves you excited about
>>>> Kudu :)
>>>>
>>>> One quick note is that I think you'll be even more pleased when you
>>>> upgrade to a later version (eg Kudu 1.5). We've improved performance in
>>>> several areas and also improved scalability compared to the version you're
>>>> testing. TIMESTAMP is also supported now, with DECIMAL soon to follow. It
>>>> might be worth noting this as an addendum to the blog post if you feel like
>>>> it.
>>>>
>>>> -Todd
>>>>
>>>> On Fri, Jan 5, 2018 at 10:51 AM, Boris Tyukin <boris@boristyukin.com>
>>>> wrote:
>>>>
>>>>> Hi guys,
>>>>>
>>>>> we just finished testing Kudu, mostly comparing Kudu to Impala on
>>>>> HDFS/parquet. I wanted to share my blog post and results. We used typical
>>>>> (and real) healthcare data for the test, not a synthetic data which I
think
>>>>> makes it is a bit more interesting.
>>>>>
>>>>> I welcome any feedback!
>>>>>
>>>>> http://boristyukin.com/benchmarking-apache-kudu-vs-apache-impala/
>>>>>
>>>>> We are really impressed with Kudu and I wanted to take an opportunity
>>>>> to thank Kudu developers for such an amazing and much-needed product.
>>>>>
>>>>> Boris
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Todd Lipcon
>>>> Software Engineer, Cloudera
>>>>
>>>
>>>
>>>
>>> --
>>> Todd Lipcon
>>> Software Engineer, Cloudera
>>>
>>
>>
>>
>> --
>> *MAURICIO ARISTIZABAL*
>> Architect - Business Intelligence + Data Science
>> mauricio@impactradius.com(m)+1 323 309 4260 <(323)%20309-4260>
>> 223 E. De La Guerra St. | Santa Barbara, CA 93101
>> <https://maps.google.com/?q=223+E.+De+La+Guerra+St.+%7C+Santa+Barbara,+CA+93101&entry=gmail&source=g>
>>
>> Overview <http://www.impactradius.com/?src=slsap> | Twitter
>> <https://twitter.com/impactradius> | Facebook
>> <https://www.facebook.com/pages/Impact-Radius/153376411365183> | LinkedIn
>> <https://www.linkedin.com/company/impact-radius-inc->
>>
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>



-- 
*MAURICIO ARISTIZABAL*
Architect - Business Intelligence + Data Science
mauricio@impactradius.com(m)+1 323 309 4260
223 E. De La Guerra St. | Santa Barbara, CA 93101

Overview <http://www.impactradius.com/?src=slsap> | Twitter
<https://twitter.com/impactradius> | Facebook
<https://www.facebook.com/pages/Impact-Radius/153376411365183> | LinkedIn
<https://www.linkedin.com/company/impact-radius-inc->

Mime
View raw message