hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eugene Koifman (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-17547) MoveTask for Acid tables race condition
Date Mon, 02 Oct 2017 18:55:00 GMT

     [ https://issues.apache.org/jira/browse/HIVE-17547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Eugene Koifman updated HIVE-17547:
----------------------------------
    Description: 
Consider Hive.moveAcidFiles()
it starts out with something like
{noformat}
          └── -ext-10000
            │   └── 000000_0
            │       ├── _orc_acid_version
            │       └── delta_0000019_0000019
            │           └── bucket_00000
            │   └── 000000_1
            │       ├── _orc_acid_version
            │       └── delta_0000019_0000019
            │           └── bucket_00001
{noformat}
for a write to a bucketed table.
The "move" handles each 000000_N separately.  The first on creates delta_0000019_0000019 under
the table/partition dir, the others just add bucket_0000N there.
That means there is a small window where someone may "ls table/part/delta_0000019_0000019"
and not see all the buckets.

Once Acid writes directly to the final location (a la MM tables) this issue resolves automatically
since txn 19 is uncommitted until everything is written.

  was:
Consider Hive.moveAcidFiles()
it starts out with something like
{noformat}
          └── -ext-10000
            │   └── 000000_0
            │       ├── _orc_acid_version
            │       └── delta_0000019_0000019
            │           └── bucket_00000
            │   └── 000000_1
            │       ├── _orc_acid_version
            │       └── delta_0000019_0000019
            │           └── bucket_00001
{noformat}
for a write to a bucketed table.
The "move" handles each 000000_N separately.  The first on creates delta_0000019_0000019 under
the table/partition dir, the others just add bucket_0000N there.
That means there is a small window where someone may "ls table/part/delta_0000019_0000019"
and not see all the buckets.


> MoveTask for Acid tables race condition
> ---------------------------------------
>
>                 Key: HIVE-17547
>                 URL: https://issues.apache.org/jira/browse/HIVE-17547
>             Project: Hive
>          Issue Type: Bug
>          Components: Transactions
>    Affects Versions: 1.0.0
>            Reporter: Eugene Koifman
>            Assignee: Eugene Koifman
>
> Consider Hive.moveAcidFiles()
> it starts out with something like
> {noformat}
>           └── -ext-10000
>             │   └── 000000_0
>             │       ├── _orc_acid_version
>             │       └── delta_0000019_0000019
>             │           └── bucket_00000
>             │   └── 000000_1
>             │       ├── _orc_acid_version
>             │       └── delta_0000019_0000019
>             │           └── bucket_00001
> {noformat}
> for a write to a bucketed table.
> The "move" handles each 000000_N separately.  The first on creates delta_0000019_0000019
under the table/partition dir, the others just add bucket_0000N there.
> That means there is a small window where someone may "ls table/part/delta_0000019_0000019"
and not see all the buckets.
> Once Acid writes directly to the final location (a la MM tables) this issue resolves
automatically since txn 19 is uncommitted until everything is written.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message