quickstep-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harshad Deshmukh <hars...@cs.wisc.edu>
Subject Re: List of potential work to do on Quickstep
Date Tue, 23 Aug 2016 18:32:30 GMT
Hi Jignesh,

Thanks for sending the list. I want to share an update on point 1.

At present I am working on partitioned aggregation, which builds on top 
of QUICKSTEP-28 and QUICKSTEP-29 JIRA issues. As the first step in this 
goal, I have created QUICKSTEP-43 JIRA issue (and a corresponding GitHub 
PR), in which we create a new operator to destruct the Aggregation state 
(similar to the destroy hash table operator). This operator will be 
useful when finalize step in aggregation is parallel and thus the shared 
state can only be destructed once the finalize phase is complete.

On 08/22/2016 08:23 PM, J Patel wrote:
> Hi folks,
>
> Here is a list of features that would be good for the community to work on.
> Feel free to add or comment on this list.
>
> 1: Improve handling of aggregation: Aggregate handling in Quickstep is slow
> as a separate hash table is being built for each aggregate. PR
> https://github.com/apache/incubator-quickstep/pull/90 is a step in fixing
> this, but there is more to be done, including increasing the space
> efficiency of the hash table, improving the finalize operation (which is
> single-threaded), and considering partitioning (so that finalize can be
> parallelized).
>
> 2: The use of ColumnVectors is very expensive as it involves a full extra
> read and write of data, and results in a bad memory access pattern. That
> design needs to be rethought/refactored. Nav has suggested using an
> iterator model v/s accessors and that is a good idea. We can probably go
> beyond that and think of defining patterns for taking an input, applying a
> predicate, and applying a projection (copy). Any ideas here are welcome.
>
> 3: We have bloomfilters and that needs to be optimized to work with joins.
> Jianqiao is working on this.
>
> 4: Error handling in the system can be improved. Here we need to consider
> if we want to use error return codes or C++ throw/catch mechanism. Right
> now we use a mix of both. I am starting to turn in favor of throw/catch as
> that way we at least have a way of catching the error at the top (rather
> than crashing). We can then refactor the code to add entire throw/catch
> chains. Right now the most serious error handling that is lacking, IMHO, is
> when we are loading a large file and there is a corrupted tuple near the
> end. The system crashes after making the user wait, and there is no
> cleanup.
>
> 5: Our type system also needs a major surgery to make it easier to add new
> types. Clean UDFs support is also missing.
>
> Other thoughts?
>
> Cheers,
> Jignesh
>

-- 
Thanks,
Harshad


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message