tajo-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hyunsik Choi (JIRA)" <j...@apache.org>
Subject [jira] [Reopened] (TAJO-1952) Implement PartitionFileFragment
Date Mon, 04 Dec 2017 04:37:00 GMT

     [ https://issues.apache.org/jira/browse/TAJO-1952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Hyunsik Choi reopened TAJO-1952:

> Implement PartitionFileFragment
> -------------------------------
>                 Key: TAJO-1952
>                 URL: https://issues.apache.org/jira/browse/TAJO-1952
>             Project: Tajo
>          Issue Type: Sub-task
>          Components: Planner/Optimizer, Storage
>            Reporter: Jaehwa Jung
>            Assignee: Jaehwa Jung
>             Fix For: 0.12.0
>         Attachments: TAJO-1952.patch, TAJO-1952_2.patch
> Currently, PartitionedTableScanNode contains the list of partitions and it seems to me
that the list has some problems as following:
> 1. Duplicate Informs: Task contains Fragment which specify target directory or target
file for scanning. A path of partition lists already would write to Fragment. 
> 2. Network Resource: When scanning lost of partition, it will occupy network resource,
for example, several hundred kilobytes or more. It looks like an unnecessary resource because
Fragment already has the path of partitions.
> I want to improve above problems by implementing new Fragment called PartitionedFileFragment.
Currently, I'm planning the implementation as following:
> * PartitionedFileFragment will borrow FileFragment and it contains the partition path
and the partition key values.  
> * Remove the path array of partitions from PartitionedTableScanNode. 
> * Implement a method for getting filtered partition directories in FileTableSpace.
> * Implement a method for making PartitionedFileFragment array.
> * Before making splits, call above method and use it for making splits.

This message was sent by Atlassian JIRA

View raw message