sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Attila Szabo (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SQOOP-3288) Incremental import's upper bound ignores session time zone in Oracle
Date Wed, 21 Feb 2018 15:43:00 GMT

    [ https://issues.apache.org/jira/browse/SQOOP-3288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16371561#comment-16371561

Attila Szabo commented on SQOOP-3288:


You're arguments are absolutely valid, maybe I've not phrased my concerns correctly:

Both of CURRENT_TIMESTAMP and LOCALTIMESTAMP fits us, I'm glad we agree in this, and I'm also
pretty much accepting that you've chosen the first one (BTW: fair enough arguments!).

The problem is the following:

Before your patch the behavior is the following:

Any timezone, and time is set in the Oracle's OS, we're using that. So regardless which node
of the cluster we're executing the import job, the result will be the same.

After your patch it's great we would not depend on the ORacle OS time anymore, though the
result would differ from machine to machine if the cluster's timezones are mixed (if I'm not
mistaken!). So if somehow we're not enforcing setting the session time, or converting and
comparing the times in a timezone aware way, the new solution could even result in "strange"

Do you see any way to ensure that e.g. everything (both last and current times) is compared
in a timezone aware way? Or are my concerns are "too paranoid"?


> Incremental import's upper bound ignores session time zone in Oracle
> --------------------------------------------------------------------
>                 Key: SQOOP-3288
>                 URL: https://issues.apache.org/jira/browse/SQOOP-3288
>             Project: Sqoop
>          Issue Type: Bug
>          Components: connectors/oracle
>    Affects Versions: 1.4.7
>            Reporter: Daniel Voros
>            Assignee: Daniel Voros
>            Priority: Major
>             Fix For: 1.5.0
>         Attachments: SQOOP-3288.1.patch
> At the moment we're using [{{SELECT SYSDATE FROM dual}}|https://github.com/apache/sqoop/blob/3153c3610da7e5db388bfb14f3681d308e9e89c6/src/java/org/apache/sqoop/manager/OracleManager.java#L652]
when getting current time from Oracle.
> SYSDATE returns the underlying operating system's current time, while CURRENT_TIMESTAMP
uses the session time zone. This could lead to problems during incremental imports *when Oracle's
time zone is different from the OS*.
> Consider the following scenario when Oracle is configured to {{+0:00}}, while the OS
is {{+5:00}}:
> ||Oracle time||OS time||Event||
> |2:00|7:00|{{sqoop import --last-value 1:00 ...}} => imports {{[1:00, 7:00)}}|
> |2:30|7:30|{{update ... set last_updated = current_timestamp ...}} => set to {{2:30}}
*Won't be imported!*|
> |3:00|8:00|{{sqoop import --last-value 7:00 ...}} => imports {{[7:00, 8:00)}}|
> This way records updated within 5 hours after the last sqoop import won't get imported.
> Please note, that the example above assumes, that the user/administrator who's updating
the Oracle table will use the current session time of Oracle when setting the "last updated"
column of the table.
> I think the solution is to use CURRENT_TIMESTAMP instead of SYSDATE. Other connection
managers, like MySQL or PostgreSQL are using that as well.

This message was sent by Atlassian JIRA

View raw message