Return-Path: X-Original-To: apmail-asterixdb-dev-archive@minotaur.apache.org Delivered-To: apmail-asterixdb-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 717AB1730C for ; Thu, 4 Jun 2015 22:39:21 +0000 (UTC) Received: (qmail 91065 invoked by uid 500); 4 Jun 2015 22:39:21 -0000 Delivered-To: apmail-asterixdb-dev-archive@asterixdb.apache.org Received: (qmail 91010 invoked by uid 500); 4 Jun 2015 22:39:21 -0000 Mailing-List: contact dev-help@asterixdb.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@asterixdb.incubator.apache.org Delivered-To: mailing list dev@asterixdb.incubator.apache.org Received: (qmail 90994 invoked by uid 99); 4 Jun 2015 22:39:20 -0000 Received: from Unknown (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 04 Jun 2015 22:39:20 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 686E0181A44 for ; Thu, 4 Jun 2015 22:39:20 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.9 X-Spam-Level: ** X-Spam-Status: No, score=2.9 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=3, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-eu-west.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id 71tUxtgQbhyO for ; Thu, 4 Jun 2015 22:39:06 +0000 (UTC) Received: from mail-pd0-f174.google.com (mail-pd0-f174.google.com [209.85.192.174]) by mx1-eu-west.apache.org (ASF Mail Server at mx1-eu-west.apache.org) with ESMTPS id E57F9207AB for ; Thu, 4 Jun 2015 22:39:04 +0000 (UTC) Received: by pdbnf5 with SMTP id nf5so39735644pdb.2 for ; Thu, 04 Jun 2015 15:38:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:subject:references :in-reply-to:content-type; bh=RITdd2AaFlzYEjoHIKOCsKwG7aVNBB8HqSub9HGgNas=; b=I9bAgv9TyTZqeybiH3PZ5wmcziIDbTFoOj4URuIpdhNcmgRYayAe+dH2vfydzxtPee /8Z4VWadPhBvuEpfRr+0e0vKfJ4kYlTRX0Zc0pG7WpFdzxqNMeAJaaDYK8+eiQZdaxTo BqWkHDDE/5a/GEpWU5gwqMI7yNgskFDa5DUl7LGJERhPg1TgBTlCF47ANzsqmWxkVciD BcYUeAH6SaIFe79XvdADompMtqqCVdVXeqGJMlx9LKoJfisY2RD3sB7jSedBQxw7L+F7 HrPoVoL4x2LzLi6Hk4Ek7Jx4NhyBBwgH/cETVLBZ/dw8oC3Av7pCpNvEEVvhEMffhooM ONJQ== X-Received: by 10.68.211.228 with SMTP id nf4mr509825pbc.116.1433457492414; Thu, 04 Jun 2015 15:38:12 -0700 (PDT) Received: from dhcp-053162.ics.uci.edu (dhcp-053162.ics.uci.edu. [128.195.53.162]) by mx.google.com with ESMTPSA id u8sm4778954pdj.46.2015.06.04.15.38.10 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 04 Jun 2015 15:38:11 -0700 (PDT) Message-ID: <5570D351.4090600@gmail.com> Date: Thu, 04 Jun 2015 15:38:09 -0700 From: Mike Carey User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: dev@asterixdb.incubator.apache.org Subject: Re: Migration of git repository References: <4D16706C-FB89-45F3-8909-D13A076696E5@westmann.org> <553671A7.7040801@ics.uci.edu> In-Reply-To: Content-Type: multipart/alternative; boundary="------------010508070208000001050804" --------------010508070208000001050804 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit At some point we really need to move to the once-discussed layered testing approach that we had "back in my youth" when I was working on DB2 at IBM. There was a tier of tests that had to be run before/during any check-in, a tier that ran nightly, and a tier that ran weekly or something like that. The first tier was the "immune system" to avoid basic accidental Bad Things that one component might do to another (an immune system). The next tier was a more substantial check of each component (taking too long, as a group of tests, to be in all developers' paths during checkins). The last tier was "everything". On 6/2/15 9:33 AM, Ian Maxon wrote: > Hi Taewoo, > It's really anything > in hyracks-tests/hyracks-storage-am-lsm-invertedindex-test (besides the > tokenizer test). All of the tests in that package alone take over 20 > minutes. Each one takes about 2 minutes. > > Thanks, > - Ian > > On Tue, Jun 2, 2015 at 9:13 AM, Taewoo Kim wrote: > >> Hi Ian, >> >> Could you specify the exact class name of the index stress test? I would >> like to look at it. Thanks. >> >> Best, >> Taewoo >> >> On Tue, Jun 2, 2015 at 9:05 AM, Ian Maxon wrote: >> >>> I'm in favor of merging them as well. Keeping the git repositories >> separate >>> doesn't enforce any kind of architectural separation, it just makes >> build + >>> test more complex. Nearly every major change is using the topic field >> hack >>> by this point. >>> I think the only downside is that the tests will take longer, but that >> may >>> need to be revisited anyway (in Hyracks, the index stress tests- >> especially >>> for inverted indexes- take far too long). >>> >>> Another .02¢ :) >>> >>> - Ian >>> >>> On Mon, Jun 1, 2015 at 9:46 PM, Yingyi Bu wrote: >>> >>>> Chris, >>>> >>>> Thanks for the input!! >>>> >>>>>> 1. If we're serious about Hyracks being a re-usable component of >> other >>>> products, it makes sense to dogfood that in Asterixdb. If there are >>>> problems ?>>keeping Hyracks separate from Asterix or keeping Hyracks >> with >>>> clean interfaces, this forces us to address them. >>>> >>>> In my opinion, merging the repository doesn't break the separation of >>>> hyracks and asterixdb, because the dependencies are controlled by mvn >> pom >>>> files. We just make the code physically live together under the root >>>> directory, one is hyracks as it is and the other is asterixdb as it is. >>>> For example, Spark lives together with all the things on top of it and >>> that >>>> doesn't seem to prevent its reusability. Hadoop lives together with >>>> Hive/Pig/Zookeeper in the same repo until year 2010 when it is very >>> stable. >>>> Currently almost all my changes are spanning hyracks and asterixdb. I >>>> believe many people also suffer from that. Merging them together will >>> have >>>> the following benefits: >>>> 1) It forces those hyracks-only changes to pass asterixdb regression >>>> tests. Currently hyracks-only change are not verified by asterixdb >>> tests. >>>> 2) On my local machine, I don't need to always install hyracks and >> then >>>> verify asterixdb from time to time. Especially, switching branches >> seems >>>> painful because the installed hyracks snapshot is overwritten from time >>> to >>>> time. >>>> 3) I only need to make one code review request and one jenkins job. >>>> Currently I need to manually change the topic of my asterixdb gerrit CL >>>> every time before I update my hyracks CL, and then manually schedule >>>> jenkins to run a new asterixdb job. If I forget to schedule the >> jenkins >>>> job, the asterixdb CL is still shown to be "verified by jenkins". >>>> >>>>>> 2. We only just recently took the initiative to take Pregelix and >>>> Hiversterix *out* of the same repository, and that was because they >> were >>>> specifically >>causing us problems as components of the same build. >>> (There >>>> were issues of competing dependency versions with Ian's YARN work, as >>> well >>>> as >>several spurious pregelix test failures, as I recall.) At a bare >>>> minimum, we cannot merge those projects back in without re-researching >>> and >>>> addressing >>those problems. >>>> >>>> Those will be definitely be fixed before Pregelix and IMRU are merged >>>> back. Hivesterix is dead and will not be merged. I'm not proposing >> that >>> we >>>> should bring Pregelix and IMRU in now but to do that later when they >> are >>>> ready. >>>> >>>> Best, >>>> Yingyi >>>> >>>> >>>> >>>> >>>> On Mon, Jun 1, 2015 at 5:15 PM, Chris Hillery >>> wrote: >>>>> My $.02 - no, we shouldn't. >>>>> >>>>> Two main reasons: >>>>> >>>>> 1. If we're serious about Hyracks being a re-usable component of >> other >>>>> products, it makes sense to dogfood that in Asterixdb. If there are >>>>> problems keeping Hyracks separate from Asterix or keeping Hyracks >> with >>>>> clean interfaces, this forces us to address them. >>>>> >>>>> 2. We only just recently took the initiative to take Pregelix and >>>>> Hiversterix *out* of the same repository, and that was because they >>> were >>>>> specifically causing us problems as components of the same build. >>> (There >>>>> were issues of competing dependency versions with Ian's YARN work, as >>>> well >>>>> as several spurious pregelix test failures, as I recall.) At a bare >>>>> minimum, we cannot merge those projects back in without >> re-researching >>>> and >>>>> addressing those problems. >>>>> >>>>> What benefits would we gain by merging them? I honestly don't agree >>> with >>>>> Yingyi's suggestion that it would make building, bug-fixing, and code >>>>> review much simpler. At best it would help a bit on those occasions >>> when >>>> a >>>>> change spans Hyracks and Asterix, and again, IMHO that is something >>> that >>>>> *should* require additional thought and oversight. As for build and >>> test, >>>>> my feeling is that it will make it considerably harder, or at the >> very >>>>> least slower, simply due to doubling the Maven overhead. >>>>> >>>>> I do not feel that merging the projects to either fit in better with >>>>> Apache, or to game the Apache popularity indexes, is a good >> trade-off. >>>>> Ceej >>>>> aka Chris Hillery >>>>> >>>>> On Mon, Jun 1, 2015 at 12:02 PM, Yingyi Bu >> wrote: >>>>>> Hi folks, >>>>>> >>>>>> Should we merge hyracks, asterixdb, and potentially >> pregelix/imru >>>>>> into the same repository? It will make build, fix, and code review >>>>>> process much simpler. >>>>>> An example is that everything built on top of Spark lives in the >>>> same >>>>>> repository: https://github.com/apache/spark. That's also why >> Spark >>>> is >>>>>> the most active Apache project now, due to its commit frequency. >>>>>> Does anyone have concerns for merging the hyracks and asterixdb >>>>>> repositories? >>>>>> Thanks! >>>>>> >>>>>> Best, >>>>>> Yingyi >>>>>> >>>>>> >>>>>> On Wed, Apr 22, 2015 at 10:13 PM, Till Westmann >>>> wrote: >>>>>>> Ok, let’s find out what is the “more work” part before we decide :) >>>>>>> >>>>>>> We should already have the SGA (as it’s part of the SGA that Mike >>> sent >>>>>>> in) and it seemed to me that all we’re need to do “later” (e.g. >> next >>>>>>> week/month) would be to >>>>>>> a) vote on bringing it into AsterixDB (that would be an incubator >>> vote >>>> I >>>>>>> assume) and >>>>>>> b) asking infra for another git repository. >>>>>>> So the extra work would be the vote on the incubator list. >>>>>>> Is that right or is there something else we’d need to do? >>>>>>> >>>>>>> Cheers, >>>>>>> Till >>>>>>> >>>>>>> On Apr 22, 2015, at 10:04 PM, Mattmann, Chris A (3980) < >>>>>>> chris.a.mattmann@jpl.nasa.gov> wrote: >>>>>>> >>>>>>> Hey Mike and team, >>>>>>> >>>>>>> Thanks for bringing this to the list. I think these are precisely >>>>>>> the type of conversations that we want to have here at the ASF and >>>>>>> as part of our Incubating project. Having these discussions in the >>>>>>> community here at the ASF (which is now the Apache AsterixDB >>> community) >>>>>>> is great. >>>>>>> >>>>>>> My opinion - it’s fine either way. I’m happy if you guys want to >>>>>>> bring Pregelix into the code base here via AsterixDB. It’s easily >>>>>>> reversible and incremental. If you want to spin out Pregelix later >>>>>>> as its own TLP and it’s shown to have its own community we can >>>>>>> file a board resolution to do that. Heck, nothing stops us from >>>>>>> graduating 2 Incubator projects=>TLPs out of this effort even in >>>>>>> the Incubator. That’s fine. If you want to wait and bring it in >>>>>>> later, it will definitely be more work - so let’s call a spade a >>>>>>> spade there. But if you want to do that that’s fine too. >>>>>>> >>>>>>> My personal recommendation - bring it in - won’t hurt and we can >>>>>>> always pivot in the ways above later. >>>>>>> >>>>>>> Cheers, >>>>>>> Chris >>>>>>> >>>>>>> >>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>>>>> Chris Mattmann, Ph.D. >>>>>>> Chief Architect >>>>>>> Instrument Software and Science Data Systems Section (398) >>>>>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >>>>>>> Office: 168-519, Mailstop: 168-527 >>>>>>> Email: chris.a.mattmann@nasa.gov >>>>>>> WWW: http://sunset.usc.edu/~mattmann/ >>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>>>>> Adjunct Associate Professor, Computer Science Department >>>>>>> University of Southern California, Los Angeles, CA 90089 USA >>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -----Original Message----- >>>>>>> From: Michael Carey >>>>>>> Date: Tuesday, April 21, 2015 at 11:49 AM >>>>>>> To: Chris Mattmann , Till Westmann >>>>>>> >>>>>>> Cc: Chris Hillery , Ian Maxon >> , >>>>>>> Yingyi >>>>>>> Bu , "dev@asterixdb.incubator.apache.org" >>>>>>> >>>>>>> Subject: Re: Migration of git repository >>>>>>> >>>>>>> Sure! Let me clarify the issue for everyone (and broaden the >>>> question). >>>>>>> One of the technical by-products of the AsterixDB project is a >> graph >>>>>>> analytics package called Pregelix - as the name suggests, it is a >>>> "knock >>>>>>> off" of Pregel, as are packages like Giraph. What's unique about >>>>>>> Pregelix is that it actually scales without OOM'ing >>>>>>> - under the covers it uses database join processing techniques. >> You >>>> can >>>>>>> find out more about it by visiting >>>>>>> http://pregelix.ics.uci.edu/ and/or by skimming the attached >> paper - >>>>>>> check out the experimental results compared to other popular >>>>>>> alternatives. Anyway, we have made it freely available (as we do >> all >>>> of >>>>>>> our AsterixDB-related >>>>>>> research products) and we were thinking that we should simply >> include >>>> it >>>>>>> under the AsterixDB project - kind of like Spark has subprojects >> for >>>> SQL, >>>>>>> streams, graphs, etc. As a result, I listed it on the list of >>>>>>> transferred artifacts when I sent in the licensing >>>>>>> form the other day. (So we at least have that step done.) Its >> code >>>>>>> conntributors have been a small subset of the AsterixDB team; it >> was >>> a >>>>>>> small sub-project, basically. (Mostly just Yingyi Bu!) >>>>>>> >>>>>>> Pregelix is kind of a sibling of Apache VXQuery in that its runtime >>> is >>>>>>> based on Hyracks but it hasn't otherwise been AsterixDB-dependent. >>>>>>> However, we have just finished teaching it to read/write directly >>> from >>>>>>> AsterixDB native storage - instead of just HDFS >>>>>>> - so now it has an AsterixDB dependency, and we are using it as a >>>>>>> driving example of how to couple AsterixDB to other analytic >> engines. >>>>>>> Rather than going through another exercise to open-source this >>>>>>> separately, it seemed like we could take this approach. >>>>>>> >>>>>>> Thoughts? >>>>>>> Cheers, >>>>>>> Mike >>>>>>> >>>>>>> >>>>>>> On 4/21/15 7:45 AM, Mattmann, Chris A (3980) wrote: >>>>>>> >>>>>>> >>>>>>> Yes, in fact, this whole conversations should be happening on >>>>>>> the dev list. OK for me to CC them on my reply? >>>>>>> >>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>>>>> Chris Mattmann, Ph.D. >>>>>>> Chief Architect >>>>>>> Instrument Software and Science Data Systems Section (398) >>>>>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >>>>>>> Office: 168-519, Mailstop: 168-527 >>>>>>> Email: chris.a.mattmann@nasa.gov >>>>>>> WWW: http://sunset.usc.edu/~mattmann/ >>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>>>>> Adjunct Associate Professor, Computer Science Department >>>>>>> University of Southern California, Los Angeles, CA 90089 USA >>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -----Original Message----- >>>>>>> From: "Michael J. Carey" >>>>>>> > >>>>>>> Date: Tuesday, April 21, 2015 at 3:13 AM >>>>>>> To: Till Westmann >>>>>> > >>>>>>> Cc: Chris Hillery >>>>>> >, Ian >>>>>>> Maxon >, >>>> Yingyi >>>>>>> Bu >> buyingyi@gmail.com >>>>>> , >>>>>>> Chris Mattmann >>>>>>> >> Chris.A.Mattmann@jpl.nasa.gov >>>>>>> > >>>>>>> Subject: Re: Migration of git repository >>>>>>> >>>>>>> + Yingyi on the Pregelix Q. Should we also ask Chris M for advice >> on >>>>>>> that? >>>>>>> On Apr 20, 2015 4:23 PM, "Till Westmann" >>>>>>> > wrote: >>>>>>> >>>>>>> Hi Ian, >>>>>>> >>>>>>> >>>>>>> That’s a good question - and I don’t know the answer. >>>>>>> We’ve got 2 repos so far: >>>>>>> >>>>>>> >> https://issues.apache.org/jira/browse/INFRA-9212https://issues.apache.org/ >>>>>>> jira/browse/INFRA-9306 >>>>>>> so we should have space for Hyracks and AsterixDB. >>>>>>> >>>>>>> >>>>>>> I think that there’s an open questions about Pregelix, but maybe >> that >>>>>>> shouldn’t keep us from going ahead. >>>>>>> >>>>>>> >>>>>>> I further think that it would be great if you could send an e-mail >> to >>>>>>> dev@asterixdb.incubator.apache.org< >>>>>>> mailto:dev@asterixdb.incubator.apache.o >>>>>>> >>>>>>> rg> >>>>>> > and ask if it’s ok to >>>>>>> import >>>>>>> our git repo(s) or if something else needs to be done first. (I >> could >>>>>>> send that e-mail as well, but it would be great if there were more >>>>>>> non-Till e0mails on the list :) ) >>>>>>> >>>>>>> >>>>>>> Cheers, >>>>>>> Till >>>>>>> >>>>>>> >>>>>>> On Apr 20, 2015, at 4:07 PM, Ian Maxon >>>>>>> > wrote: >>>>>>> >>>>>>> Hi Mike, Chris and Till, >>>>>>> >>>>>>> >>>>>>> Since (I think?) the paperwork for the software grant is done now, >>>> should >>>>>>> I copy our GC branches over to the ASF git repositories now ( as >> well >>>> as >>>>>>> making it a mirror in the Gerrit commit hook script)? >>>>>>> >>>>>>> >>>>>>> Thanks, >>>>>>> - Ian >>>>>>> >>>>>>> >>>>>>> --------------010508070208000001050804--