asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Carey <dtab...@gmail.com>
Subject Re: Migration of git repository
Date Thu, 04 Jun 2015 22:38:09 GMT
At some point we really need to move to the once-discussed layered 
testing approach that we had "back in my youth" when I was working on 
DB2 at IBM.  There was a tier of tests that had to be run before/during 
any check-in, a tier that ran nightly, and a tier that ran weekly or 
something like that. The first tier was the "immune system" to avoid 
basic accidental Bad Things that one component might do to another (an 
immune system).  The next tier was a more substantial check of each 
component (taking too long, as a group of tests, to be in all 
developers' paths during checkins).  The last tier was "everything".


On 6/2/15 9:33 AM, Ian Maxon wrote:
> Hi Taewoo,
> It's really anything
> in hyracks-tests/hyracks-storage-am-lsm-invertedindex-test (besides the
> tokenizer test).  All of the tests in that package alone take over 20
> minutes. Each one takes about 2 minutes.
>
> Thanks,
> - Ian
>
> On Tue, Jun 2, 2015 at 9:13 AM, Taewoo Kim <wangsaeu@gmail.com> wrote:
>
>> Hi Ian,
>>
>> Could you specify the exact class name of the index stress test? I would
>> like to look at it. Thanks.
>>
>> Best,
>> Taewoo
>>
>> On Tue, Jun 2, 2015 at 9:05 AM, Ian Maxon <imaxon@uci.edu> wrote:
>>
>>> I'm in favor of merging them as well. Keeping the git repositories
>> separate
>>> doesn't enforce any kind of architectural separation, it just makes
>> build +
>>> test more complex. Nearly every major change is using the topic field
>> hack
>>> by this point.
>>> I think the only downside is that the tests will take longer, but that
>> may
>>> need to be revisited anyway (in Hyracks, the index stress tests-
>> especially
>>> for inverted indexes- take far too long).
>>>
>>> Another .02¢ :)
>>>
>>> - Ian
>>>
>>> On Mon, Jun 1, 2015 at 9:46 PM, Yingyi Bu <buyingyi@gmail.com> wrote:
>>>
>>>> Chris,
>>>>
>>>> Thanks for the input!!
>>>>
>>>>>> 1. If we're serious about Hyracks being a re-usable component of
>> other
>>>> products, it makes sense to dogfood that in Asterixdb. If there are
>>>> problems ?>>keeping Hyracks separate from Asterix or keeping Hyracks
>> with
>>>> clean interfaces, this forces us to address them.
>>>>
>>>> In my opinion,  merging the repository doesn't break the separation of
>>>> hyracks and asterixdb, because the dependencies are controlled by mvn
>> pom
>>>> files. We just make the code physically live together under the root
>>>> directory, one is hyracks as it is and the other is asterixdb as it is.
>>>> For example, Spark lives together with all the things on top of it and
>>> that
>>>> doesn't seem to prevent its reusability. Hadoop lives together with
>>>> Hive/Pig/Zookeeper in the same repo until year 2010 when it is very
>>> stable.
>>>> Currently almost all my changes are spanning hyracks and asterixdb.  I
>>>> believe many people also suffer from that.  Merging them together will
>>> have
>>>> the following benefits:
>>>> 1) It forces those hyracks-only changes to pass asterixdb regression
>>>> tests.  Currently hyracks-only change are not verified by asterixdb
>>> tests.
>>>> 2) On my local machine,  I don't need to always install hyracks and
>> then
>>>> verify asterixdb from time to time.  Especially, switching branches
>> seems
>>>> painful because the installed hyracks snapshot is overwritten from time
>>> to
>>>> time.
>>>> 3) I only need to make one code review request and one jenkins job.
>>>> Currently I need to manually change the topic of my asterixdb gerrit CL
>>>> every time before I update my hyracks CL, and then manually schedule
>>>> jenkins to run a new asterixdb job.  If I forget to schedule the
>> jenkins
>>>> job, the asterixdb CL is still shown to be "verified by jenkins".
>>>>
>>>>>> 2. We only just recently took the initiative to take Pregelix and
>>>> Hiversterix *out* of the same repository, and that was because they
>> were
>>>> specifically >>causing us problems as components of the same build.
>>> (There
>>>> were issues of competing dependency versions with Ian's YARN work, as
>>> well
>>>> as >>several spurious pregelix test failures, as I recall.) At a bare
>>>> minimum, we cannot merge those projects back in without re-researching
>>> and
>>>> addressing >>those problems.
>>>>
>>>> Those will be definitely be fixed before Pregelix and IMRU are merged
>>>> back.  Hivesterix is dead and will not be merged. I'm not proposing
>> that
>>> we
>>>> should bring Pregelix and IMRU in now but to do that later when they
>> are
>>>> ready.
>>>>
>>>> Best,
>>>> Yingyi
>>>>
>>>>
>>>>
>>>>
>>>> On Mon, Jun 1, 2015 at 5:15 PM, Chris Hillery <chillery@lambda.nu>
>>> wrote:
>>>>> My $.02 - no, we shouldn't.
>>>>>
>>>>> Two main reasons:
>>>>>
>>>>> 1. If we're serious about Hyracks being a re-usable component of
>> other
>>>>> products, it makes sense to dogfood that in Asterixdb. If there are
>>>>> problems keeping Hyracks separate from Asterix or keeping Hyracks
>> with
>>>>> clean interfaces, this forces us to address them.
>>>>>
>>>>> 2. We only just recently took the initiative to take Pregelix and
>>>>> Hiversterix *out* of the same repository, and that was because they
>>> were
>>>>> specifically causing us problems as components of the same build.
>>> (There
>>>>> were issues of competing dependency versions with Ian's YARN work, as
>>>> well
>>>>> as several spurious pregelix test failures, as I recall.) At a bare
>>>>> minimum, we cannot merge those projects back in without
>> re-researching
>>>> and
>>>>> addressing those problems.
>>>>>
>>>>> What benefits would we gain by merging them? I honestly don't agree
>>> with
>>>>> Yingyi's suggestion that it would make building, bug-fixing, and code
>>>>> review much simpler. At best it would help a bit on those occasions
>>> when
>>>> a
>>>>> change spans Hyracks and Asterix, and again, IMHO that is something
>>> that
>>>>> *should* require additional thought and oversight. As for build and
>>> test,
>>>>> my feeling is that it will make it considerably harder, or at the
>> very
>>>>> least slower, simply due to doubling the Maven overhead.
>>>>>
>>>>> I do not feel that merging the projects to either fit in better with
>>>>> Apache, or to game the Apache popularity indexes, is a good
>> trade-off.
>>>>> Ceej
>>>>> aka Chris Hillery
>>>>>
>>>>> On Mon, Jun 1, 2015 at 12:02 PM, Yingyi Bu <buyingyi@gmail.com>
>> wrote:
>>>>>> Hi folks,
>>>>>>
>>>>>>      Should we merge hyracks, asterixdb, and potentially
>> pregelix/imru
>>>>>> into the same repository?   It will make build, fix, and code review
>>>>>> process much simpler.
>>>>>>      An example is that everything built on top of Spark lives in
the
>>>> same
>>>>>> repository:  https://github.com/apache/spark.   That's also why
>> Spark
>>>> is
>>>>>> the most active Apache project now, due to its commit frequency.
>>>>>>      Does anyone have concerns for merging the hyracks and asterixdb
>>>>>> repositories?
>>>>>>      Thanks!
>>>>>>
>>>>>> Best,
>>>>>> Yingyi
>>>>>>
>>>>>>
>>>>>> On Wed, Apr 22, 2015 at 10:13 PM, Till Westmann <tillw@apache.org>
>>>> wrote:
>>>>>>> Ok, let’s find out what is the “more work” part before
we decide :)
>>>>>>>
>>>>>>> We should already have the SGA (as it’s part of the SGA that
Mike
>>> sent
>>>>>>> in) and it seemed to me that all we’re need to do “later”
(e.g.
>> next
>>>>>>> week/month) would be to
>>>>>>> a) vote on bringing it into AsterixDB (that would be an incubator
>>> vote
>>>> I
>>>>>>> assume) and
>>>>>>> b) asking infra for another git repository.
>>>>>>> So the extra work would be the vote on the incubator list.
>>>>>>> Is that right or is there something else we’d need to do?
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Till
>>>>>>>
>>>>>>> On Apr 22, 2015, at 10:04 PM, Mattmann, Chris A (3980) <
>>>>>>> chris.a.mattmann@jpl.nasa.gov> wrote:
>>>>>>>
>>>>>>> Hey Mike and team,
>>>>>>>
>>>>>>> Thanks for bringing this to the list. I think these are precisely
>>>>>>> the type of conversations that we want to have here at the ASF
and
>>>>>>> as part of our Incubating project. Having these discussions in
the
>>>>>>> community here at the ASF (which is now the Apache AsterixDB
>>> community)
>>>>>>> is great.
>>>>>>>
>>>>>>> My opinion - it’s fine either way. I’m happy if you guys
want to
>>>>>>> bring Pregelix into the code base here via AsterixDB. It’s
easily
>>>>>>> reversible and incremental. If you want to spin out Pregelix
later
>>>>>>> as its own TLP and it’s shown to have its own community we
can
>>>>>>> file a board resolution to do that. Heck, nothing stops us from
>>>>>>> graduating 2 Incubator projects=>TLPs out of this effort even
in
>>>>>>> the Incubator. That’s fine. If you want to wait and bring it
in
>>>>>>> later, it will definitely be more work - so let’s call a spade
a
>>>>>>> spade there. But if you want to do that that’s fine too.
>>>>>>>
>>>>>>> My personal recommendation - bring it in - won’t hurt and we
can
>>>>>>> always pivot in the ways above later.
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Chris
>>>>>>>
>>>>>>>
>>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>> Chris Mattmann, Ph.D.
>>>>>>> Chief Architect
>>>>>>> Instrument Software and Science Data Systems Section (398)
>>>>>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>>>>>> Office: 168-519, Mailstop: 168-527
>>>>>>> Email: chris.a.mattmann@nasa.gov
>>>>>>> WWW:  http://sunset.usc.edu/~mattmann/
>>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>> Adjunct Associate Professor, Computer Science Department
>>>>>>> University of Southern California, Los Angeles, CA 90089 USA
>>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Michael Carey <mjcarey@ics.uci.edu>
>>>>>>> Date: Tuesday, April 21, 2015 at 11:49 AM
>>>>>>> To: Chris Mattmann <Chris.A.Mattmann@jpl.nasa.gov>, Till
Westmann
>>>>>>> <till@westmann.org>
>>>>>>> Cc: Chris Hillery <chillery@lambda.nu>, Ian Maxon <imaxon@uci.edu
>>> ,
>>>>>>> Yingyi
>>>>>>> Bu <buyingyi@gmail.com>, "dev@asterixdb.incubator.apache.org"
>>>>>>> <dev@asterixdb.incubator.apache.org>
>>>>>>> Subject: Re: Migration of git repository
>>>>>>>
>>>>>>> Sure!  Let me clarify the issue for everyone (and broaden the
>>>> question).
>>>>>>> One of the technical by-products of the AsterixDB project is
a
>> graph
>>>>>>> analytics package called Pregelix - as the name suggests, it
is a
>>>> "knock
>>>>>>> off" of Pregel, as are packages like Giraph.  What's unique about
>>>>>>> Pregelix is that it actually scales without OOM'ing
>>>>>>> - under the covers it uses database join processing techniques.
>> You
>>>> can
>>>>>>> find out more about it by visiting
>>>>>>> http://pregelix.ics.uci.edu/ and/or by skimming the attached
>> paper -
>>>>>>> check out the experimental results compared to other popular
>>>>>>> alternatives.  Anyway, we have made it freely available (as we
do
>> all
>>>> of
>>>>>>> our AsterixDB-related
>>>>>>> research products) and we were thinking that we should simply
>> include
>>>> it
>>>>>>> under the AsterixDB project - kind of like Spark has subprojects
>> for
>>>> SQL,
>>>>>>> streams, graphs, etc.  As a result, I listed it on the list of
>>>>>>> transferred artifacts when I sent in the licensing
>>>>>>> form the other day.  (So we at least have that step done.)  Its
>> code
>>>>>>> conntributors have been a small subset of the AsterixDB team;
it
>> was
>>> a
>>>>>>> small sub-project, basically.  (Mostly just Yingyi Bu!)
>>>>>>>
>>>>>>> Pregelix is kind of a sibling of Apache VXQuery in that its runtime
>>> is
>>>>>>> based on Hyracks but it hasn't otherwise been AsterixDB-dependent.
>>>>>>> However, we have just finished teaching it to read/write directly
>>> from
>>>>>>> AsterixDB native storage - instead of just HDFS
>>>>>>> - so now it has an AsterixDB dependency, and we are using it
as a
>>>>>>> driving example of how to couple AsterixDB to other analytic
>> engines.
>>>>>>> Rather than going through another exercise to open-source this
>>>>>>> separately, it seemed like we could take this approach.
>>>>>>>
>>>>>>> Thoughts?
>>>>>>> Cheers,
>>>>>>> Mike
>>>>>>>
>>>>>>>
>>>>>>> On 4/21/15 7:45 AM, Mattmann, Chris A (3980) wrote:
>>>>>>>
>>>>>>>
>>>>>>> Yes, in fact, this whole conversations should be happening on
>>>>>>> the dev list. OK for me to CC them on my reply?
>>>>>>>
>>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>> Chris Mattmann, Ph.D.
>>>>>>> Chief Architect
>>>>>>> Instrument Software and Science Data Systems Section (398)
>>>>>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>>>>>> Office: 168-519, Mailstop: 168-527
>>>>>>> Email: chris.a.mattmann@nasa.gov
>>>>>>> WWW:  http://sunset.usc.edu/~mattmann/
>>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>> Adjunct Associate Professor, Computer Science Department
>>>>>>> University of Southern California, Los Angeles, CA 90089 USA
>>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: "Michael J. Carey" <mjcarey@ics.uci.edu>
>>>>>>> <mailto:mjcarey@ics.uci.edu <mjcarey@ics.uci.edu>>
>>>>>>> Date: Tuesday, April 21, 2015 at 3:13 AM
>>>>>>> To: Till Westmann <till@westmann.org> <mailto:till@westmann.org
>>>>>>> <till@westmann.org>>
>>>>>>> Cc: Chris Hillery <chillery@lambda.nu> <mailto:chillery@lambda.nu
>>>>>>> <chillery@lambda.nu>>, Ian
>>>>>>> Maxon <imaxon@uci.edu> <mailto:imaxon@uci.edu <imaxon@uci.edu>>,
>>>> Yingyi
>>>>>>> Bu <buyingyi@gmail.com> <mailto:buyingyi@gmail.com <
>>> buyingyi@gmail.com
>>>>>> ,
>>>>>>> Chris Mattmann
>>>>>>> <Chris.A.Mattmann@jpl.nasa.gov> <mailto:
>>> Chris.A.Mattmann@jpl.nasa.gov
>>>>>>> <Chris.A.Mattmann@jpl.nasa.gov>>
>>>>>>> Subject: Re: Migration of git repository
>>>>>>>
>>>>>>> + Yingyi on the Pregelix Q.  Should we also ask Chris M for advice
>> on
>>>>>>> that?
>>>>>>> On Apr 20, 2015 4:23 PM, "Till Westmann" <till@westmann.org>
>>>>>>> <mailto:till@westmann.org <till@westmann.org>> wrote:
>>>>>>>
>>>>>>> Hi Ian,
>>>>>>>
>>>>>>>
>>>>>>> That’s a good question - and I don’t know the answer.
>>>>>>> We’ve got 2 repos so far:
>>>>>>>
>>>>>>>
>> https://issues.apache.org/jira/browse/INFRA-9212https://issues.apache.org/
>>>>>>> jira/browse/INFRA-9306
>>>>>>> so we should have space for Hyracks and AsterixDB.
>>>>>>>
>>>>>>>
>>>>>>> I think that there’s an open questions about Pregelix, but
maybe
>> that
>>>>>>> shouldn’t keep us from going ahead.
>>>>>>>
>>>>>>>
>>>>>>> I further think that it would be great if you could send an e-mail
>> to
>>>>>>> dev@asterixdb.incubator.apache.org<
>>>>>>> mailto:dev@asterixdb.incubator.apache.o
>>>>>>> <dev@asterixdb.incubator.apache.o>
>>>>>>> rg> <mailto:dev@asterixdb.incubator.apache.org
>>>>>>> <dev@asterixdb.incubator.apache.org>> and ask if it’s
ok to
>>>>>>> import
>>>>>>> our git repo(s) or if something else needs to be done first.
(I
>> could
>>>>>>> send that e-mail as well, but it would be great if there were
more
>>>>>>> non-Till e0mails on the list :) )
>>>>>>>
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Till
>>>>>>>
>>>>>>>
>>>>>>> On Apr 20, 2015, at 4:07 PM, Ian Maxon <imaxon@uci.edu>
>>>>>>> <mailto:imaxon@uci.edu <imaxon@uci.edu>> wrote:
>>>>>>>
>>>>>>> Hi Mike, Chris and Till,
>>>>>>>
>>>>>>>
>>>>>>> Since (I think?) the paperwork for the software grant is done
now,
>>>> should
>>>>>>> I copy our GC branches over to the ASF git repositories now (
as
>> well
>>>> as
>>>>>>> making it a mirror in the Gerrit commit hook script)?
>>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> - Ian
>>>>>>>
>>>>>>>
>>>>>>>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message