kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mauricio Aristizabal <mauri...@impactradius.com>
Subject Re: Time-travel reads via SQL query
Date Thu, 30 Nov 2017 22:36:57 GMT
Thanks Andrew.  However if I'm understanding the low level details in that
ticket I don't think it's what I'm asking either, at least not
entirely: KUDU-1702 is for exposing point-in-time reads to Impala
DEVELOPERS, i'm hoping it can be exposed to Impala USERS via a SQL
extension, so we can leverage it from user queries/reports. -m

On Tue, Nov 28, 2017 at 2:09 PM, Andrew Wong <awong@cloudera.com> wrote:

> Ah I see, looks like this is an open, tracked issue. See KUDU-1702
> <https://issues.apache.org/jira/browse/KUDU-1702?jql=project%20%3D%20KUDU%20AND%20resolution%20%3D%20Unresolved%20AND%20text%20~%20%22read_at_snapshot%22%20ORDER%20BY%20priority%20DESC%2C%20updated%20DESC>,
> although not sure what the status on it is.
>
> On Tue, Nov 28, 2017 at 12:17 PM, Mauricio Aristizabal <
> mauricio@impactradius.com> wrote:
>
>> Thanks very much David and Andrew.  Yes I'm aware this functionality is
>> available via the java and C++ clients, but actually what I'm asking is if
>> it could be made available via SQL/impala.  Something like "select X from Y
>> where snapshot_micros = 2343242423 <(234)%20324-2423>" (where
>> snapshot_micros is a virtual column that would need a better name), or
>> perhaps as part of the table name like "select X from Y@2343242423
>> <(234)%20324-2423>".  -m
>>
>> On Tue, Nov 28, 2017 at 12:05 PM, David Alves <davidralves@gmail.com>
>> wrote:
>>
>>> Hi Mauricio
>>>
>>>   Andrew is right. That feature already exists in some form. With
>>> READ_AT_SNAPSHOT you can provide a timestamp which will be the timepoint
>>> under which all the scans are performed.
>>>   Note that, while generally supported and functionally tested, we
>>> haven't focused a lot of resources into testing this, so your performance
>>> mileage may vary.
>>>   In order to enable this for time points more than 5 mins in the past
>>> you need to increase the "--tablet_history_max_age_sec" flag so that
>>> the history won't get garbage collected.
>>>
>>> HTH
>>> -david
>>>
>>> On Mon, Nov 27, 2017 at 9:42 PM, Andrew Wong <awong@cloudera.com> wrote:
>>>
>>>> Hi Mauricio,
>>>>
>>>> If you haven't already, take a look at the READ_AT_SNAPSHOT read mode
>>>> (more info here
>>>> <https://kudu.apache.org/docs/transaction_semantics.html#_read_operations_scans>).
>>>> IIUC, it seems similar to, if not exactly what you're looking for!
>>>>
>>>>
>>>> Andrew
>>>>
>>>> On Mon, Nov 27, 2017 at 5:02 PM, Mauricio Aristizabal <
>>>> mauricio@impactradius.com> wrote:
>>>>
>>>>> Hi all, has there been any talk of supporting this any time soon?
>>>>>
>>>>> Time travel reads are such a cool feature, but even more than in ETL
>>>>> jobs (via Java/Scala), they would be most useful via SQL to ensure
>>>>> consistency when reading.
>>>>>
>>>>> Specifically, for example our spark streaming job updates dozens of
>>>>> aggregation tables every 30 seconds.  To make the data fully consistent
we
>>>>> would love to have views over these aggs tagged with the exact timestamp
we
>>>>> want to expose.  When each batch is done and all tables updated, we would
>>>>> update all the views forward, effectively hiding the updates we're doing
>>>>> until they're all ready.
>>>>>
>>>>> -m
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> *MAURICIO ARISTIZABAL*
>>>>> Architect - Business Intelligence + Data Science
>>>>> mauricio@impactradius.com(m)+1 323 309 4260 <(323)%20309-4260>
>>>>> 223 E. De La Guerra St. | Santa Barbara, CA 93101
>>>>> <https://maps.google.com/?q=223+E.+De+La+Guerra+St.+%7C+Santa+Barbara,+CA+93101&entry=gmail&source=g>
>>>>>
>>>>> Overview <http://www.impactradius.com/?src=slsap> | Twitter
>>>>> <https://twitter.com/impactradius> | Facebook
>>>>> <https://www.facebook.com/pages/Impact-Radius/153376411365183>
|
>>>>> LinkedIn <https://www.linkedin.com/company/impact-radius-inc->
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Andrew Wong
>>>>
>>>
>>>
>>
>>
>> --
>> *MAURICIO ARISTIZABAL*
>> Architect - Business Intelligence + Data Science
>> mauricio@impactradius.com(m)+1 323 309 4260 <(323)%20309-4260>
>> 223 E. De La Guerra St. | Santa Barbara, CA 93101
>> <https://maps.google.com/?q=223+E.+De+La+Guerra+St.+%7C+Santa+Barbara,+CA+93101&entry=gmail&source=g>
>>
>> Overview <http://www.impactradius.com/?src=slsap> | Twitter
>> <https://twitter.com/impactradius> | Facebook
>> <https://www.facebook.com/pages/Impact-Radius/153376411365183> | LinkedIn
>> <https://www.linkedin.com/company/impact-radius-inc->
>>
>
>
>
> --
> Andrew Wong
>



-- 
*MAURICIO ARISTIZABAL*
Architect - Business Intelligence + Data Science
mauricio@impactradius.com(m)+1 323 309 4260
223 E. De La Guerra St. | Santa Barbara, CA 93101

Overview <http://www.impactradius.com/?src=slsap> | Twitter
<https://twitter.com/impactradius> | Facebook
<https://www.facebook.com/pages/Impact-Radius/153376411365183> | LinkedIn
<https://www.linkedin.com/company/impact-radius-inc->

Mime
View raw message