tajo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Min Zhou (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TAJO-475) Table partition catalog recap
Date Sun, 05 Jan 2014 00:05:50 GMT

    [ https://issues.apache.org/jira/browse/TAJO-475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13862449#comment-13862449

Min Zhou commented on TAJO-475:

Yes, your correctly understood my meaning.  For the first step, we can build in-memory data
structure. That's quite fast and straightforward. However, for a long term goal. We should
think about those aspect.

1. Memory is expensive.  There is a proverb, use our limited funds where they can be put to
best use.  We can cut down the footprint through compression, LRU cache and multiply-layer
storage memory ->SSD -> hard disk.

2. I am not familiar with the yarn mode of tajo. For the knowledge, the worker is spawned
by nodemanager on demand.  Since the workers can't always standup,  they can't keep data in-memory
for sharing with subsequent queries. A solution is put this cache manager as a aux service
in nodemanager like shuffle service in hadoop mapreduce.

Thanks for the feedback, really encouraging.


> Table partition catalog recap
> -----------------------------
>                 Key: TAJO-475
>                 URL: https://issues.apache.org/jira/browse/TAJO-475
>             Project: Tajo
>          Issue Type: Sub-task
>          Components: catalog
>            Reporter: Min Zhou
>            Assignee: Min Zhou
> Query master need to know where partitions of memory cached table locate. 
> At least we need a meta table contain such information
> |partition_id|
> |partition_value|
> |ordinal_position|
> |locations|
> Any suggestion?

This message was sent by Atlassian JIRA

View raw message