asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sandeep Joshi <sanjos...@gmail.com>
Subject Re: Question on language translation for Algebricks
Date Mon, 15 Feb 2016 08:41:38 GMT
Thanks for all the helpful answers.  I will check out vxquery next.

Algebricks seems like the equivalent of LLVM for query languages.   I am
wondering if Algebricks is powerful enough to map any query language, be it
graph-based, relational or hierarchical (mrql, sql, pregel).  Is there a
formal proof of this expressive power ?   Will there always be a one-to-one
correspondence between the plan trees of different languages or would there
be a case where one would have to expand to look at sub-trees while doing
query translation ?

-Sandeep


On Mon, Feb 15, 2016 at 4:21 AM, Mike Carey <dtabass@gmail.com> wrote:

> PS: There's an important point below that you shouldn't miss (Sandeep) if
> you look at the Hivesterix code - if you find its approach puzzling, note
> that it was designed to only add what was needed to run Hive queries on
> Hyracks - and so that it could potentially be kept in upper-level sync with
> Hive itself.  As a result, it was not done as a "Hive lookalike done right"
> - it was done as a "Hive lookalike that lets the existing Hive code do as
> much of the initial work as possible".
>
>
>
> On 2/14/16 2:48 PM, Yingyi Bu wrote:
>
>> Hi Sandeep,
>>
>> Here is the Hivesterix codebase in the Apache source tree:
>>
>> https://github.com/apache/incubator-asterixdb-hyracks/tree/fullstack-0.2.13
>>
>> We have maintained Hivesterix up to hyracks-0.2.13, but stopped
>> maintaining
>> after that release. Mike has elaborated the reason.
>>
>> Furthermore, none of these rewrite rules seem to be SQL-specific.  Are
>>>> there
>>>>
>>> any SQL-specific rewrite rules which were added?
>> That's exactly the motivation of the Algebricks project --- most rules
>> that
>> a typical SQL compiler implemented are not SQL-specific:-)
>> However, there indeed are few Hive-specific rules that I added in order to
>> get the Hive-on-Algebricks plan work efficiently:
>>
>> https://github.com/apache/incubator-asterixdb-hyracks/tree/fullstack-0.2.13/hivesterix/hivesterix-optimizer/src/main/java/edu/uci/ics/hivesterix/optimizer/rules
>>
>> The Hivesterix implementation first translates a Hive-optimized MR plan
>> into an Algebricks logical plan, and then let Algebricks do further
>> optimizations and finally execute the resulting Hyracks job on the Hyracks
>> runtime.
>>
>> Best,
>> Yingyi
>>
>>
>>
>> On Sun, Feb 14, 2016 at 2:26 PM, Mike Carey <dtabass@gmail.com> wrote:
>>
>> Sandeep,
>>>
>>> Just to chime in as well:
>>>
>>>   - VXQuery is indeed the best example to look at, probably, to
>>> understand
>>> the AsterixDB/Algebricks separation.
>>>
>>>   - Hivesterix was built by Yingyi Bu (who'll see this) early on - it
>>> drove
>>> the separation idea, actually, but we made a decision not to try and
>>> maintain it.  It was intended to provide a third/different proof of
>>> separation and applicability of the approach, from a research standpoint,
>>> but doesn't have additional value to offer the world (since Hive itself
>>> is
>>> a moving target and Hive on Tez now provides the non-MapReduce-runtime
>>> value that Hivesterix initially offered).  Yingi would probably be happy
>>> to
>>> share the code base with you if you wanted to look at it for any reason,
>>> but the only things in the Apache AsterixDB (incubating) project are
>>> things
>>> deemed worthy of engineering/maintenance work.
>>>
>>> Hope that helps too!
>>>
>>> Cheers,
>>> Mike
>>>
>>>
>>>
>>> On 2/14/16 11:47 AM, Till Westmann wrote:
>>>
>>> Hi Sandeep,
>>>>
>>>> Apache VXQuery, the XQuery implementation mentioned in the SoCC paper,
>>>> is
>>>> a separate project [1].
>>>>
>>>> Specifically to your questions:
>>>>
>>>> 1) There is no need to implement other projects that use Algebricks
>>>> inside of the AsterixDB source tree (as VXQuery shows).
>>>>
>>>> 2) It is clearly easier to combine a Java parser and plan tree generator
>>>> with Algebricks, but there's no reason why one couldn't connect to other
>>>> languages (e.g. by using a text-based intermediate format between the
>>>> parser and the optimizer and between the plan generator and the
>>>> runtime).
>>>>
>>>> 3) The reason for the different set of rules is that some are language
>>>> agnostic and some are language-specific. As you can see in figure 2 of
>>>> the
>>>> paper a language implementation has to provide language-specific rules
>>>> to
>>>> augment the language-agnostic rules provided by Algebricks.
>>>> Specifically, the rules in AsterixDB's asterix-algebra project augment
>>>> the rules in Algebricks to support AsterixDB's query language AQL.
>>>>
>>>> Hope this helps,
>>>> Till
>>>>
>>>> [1] http://vxquery.apache.org
>>>>
>>>> On 14 Feb 2016, at 11:02, Sandeep Joshi wrote:
>>>>
>>>> I had some questions about the process of mapping other query languages
>>>> to
>>>>
>>>>> Algebricks.  The Sigmod SoCC 15 paper mentions that two languages
>>>>> XQuery
>>>>> and HiveQL which have been mapped to Algebricks, but the implementation
>>>>> is
>>>>> not found in either of the two repositories released under Apache.
>>>>>
>>>>> I found Hivesterix and Pregelix under
>>>>>
>>>>> https://github.com/madhusudancs/hyracks/tree/master/fullstack/hivesterix
>>>>>
>>>>> I couldn't find the XQuery to Algebricks translator anywhere. Has this
>>>>> been released ?
>>>>>
>>>>> What is the reason these language translators are not part of the
>>>>> Apache
>>>>> repository ?
>>>>>
>>>>> The Apache repositories contain the language translators for AQL and
>>>>> SQL.
>>>>> After comparing the implementations for Hivesterix and SQL/AQL, here
>>>>> are
>>>>> some questions
>>>>>
>>>>> 1) Does one have to integrate the parser for a new language within the
>>>>> Apache AsterixDB source tree, or can one build the Algebricks
>>>>> translator
>>>>> outside the Apache tree and invoke the Hyracks job execution engine
>>>>> directly, as is being done in the hivesterix implementation seen here.
>>>>>
>>>>>
>>>>>
>>>>> https://github.com/madhusudancs/hyracks/blob/36bb1021b17b736aa1648bd439e1246ae419aa89/fullstack/hivesterix/hivesterix-dist/src/main/java/edu/uci/ics/hivesterix/runtime/exec/HyracksExecutionEngine.java
>>>>>
>>>>> 2) When a query language is converted to Algebricks, the
>>>>> ICompilerFactory
>>>>> converts one plan tree to another by calling Visitor::visit() on each
>>>>> node
>>>>> of the source query.  Does this imply that the plan tree for the source
>>>>> language can only be constructed in Java ?  Would it be
>>>>> difficult/impossible to integrate a parser and plan tree generator
>>>>> which
>>>>> was written in any language into Algebricks ?
>>>>>
>>>>> 3) In the Apache repositories, the query rewrite rules which are used
>>>>> during optimization are found under two different repositories.
>>>>>
>>>>> One in main asterixdb repository
>>>>>
>>>>>
>>>>>
>>>>> https://github.com/apache/incubator-asterixdb/tree/master/asterix-algebra/src/main/java/org/apache/asterix/optimizer/rules
>>>>>
>>>>> and the other in the hyracks repository
>>>>>
>>>>>
>>>>>
>>>>> https://github.com/apache/incubator-asterixdb-hyracks/tree/master/algebricks/algebricks-rewriter/src/main/java/org/apache/hyracks/algebricks/rewriter/rules
>>>>>
>>>>> Are these two sets of rules characteristically different or is this
>>>>> duplication just an artifact of rapid prototyping ?
>>>>>
>>>>> Furthermore, none of these rewrite rules seem to be SQL-specific.  Are
>>>>> there any SQL-specific rewrite rules which were added ?
>>>>>
>>>>> -Sandeep
>>>>>
>>>>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message