tajo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hyunsik Choi (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TAJO-472) Umbrella ticket for accelerating query speed through memory cached table
Date Mon, 06 Jan 2014 06:27:53 GMT

    [ https://issues.apache.org/jira/browse/TAJO-472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13862795#comment-13862795

Hyunsik Choi commented on TAJO-472:

Min and Jihoon,

Thank you for nice discussion. From your discussion, I have fully understood the proposal.
 Here is my comment.

This proposal is very promising for me. The important point is that Tajo is flexible enough
to embrace those features without any performance degradation. Depending on workloads, users
could just decide those features for indexing some data sets or caching frequently-used tables
into memory on some workers have available memory.

As to latest comment, my answer is the same. As you know, indexing technique inherently has
trade-off relationship. It's just a user decision to use indexes according to workloads. In
addition, the problem about the same row number or the same bytes for data packs may be too
detail to be discussed in this time. In my opinion,  the problem won't be challenge. We can
solve the problem with some engineering decision.

Thank you for nice technical discussion.

> Umbrella ticket for accelerating query speed through memory cached table
> ------------------------------------------------------------------------
>                 Key: TAJO-472
>                 URL: https://issues.apache.org/jira/browse/TAJO-472
>             Project: Tajo
>          Issue Type: New Feature
>          Components: distributed query plan, physical operator
>            Reporter: Min Zhou
>            Assignee: Min Zhou
>         Attachments: TAJO-472 Proposal.pdf
> Previously, I was involved as a technical expert into an in-memory database for on-line
businesses in Alibaba group. That's  an internal project, which can do group by aggregation
on billions of rows in less than 1 second.  
> I'd like to apply this technology into tajo, make it much faster than it is. From some
benchmark,  we believe that spark&shark currently is the fastest solution among all the
open source interactive query system , such as impala, presto, tajo.  The main reason is that
it benefit from in-memory data. 
> I will take memory cached table as my first step to  accelerate query speed of tajo.
Actually , this is the reason why I concerned at table partition during Xmas and new year
> Will submit a proposal soon.

This message was sent by Atlassian JIRA

View raw message