asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Maxon <ima...@uci.edu>
Subject Re: Migration of git repository
Date Tue, 02 Jun 2015 16:33:18 GMT
Hi Taewoo,
It's really anything
in hyracks-tests/hyracks-storage-am-lsm-invertedindex-test (besides the
tokenizer test).  All of the tests in that package alone take over 20
minutes. Each one takes about 2 minutes.

Thanks,
- Ian

On Tue, Jun 2, 2015 at 9:13 AM, Taewoo Kim <wangsaeu@gmail.com> wrote:

> Hi Ian,
>
> Could you specify the exact class name of the index stress test? I would
> like to look at it. Thanks.
>
> Best,
> Taewoo
>
> On Tue, Jun 2, 2015 at 9:05 AM, Ian Maxon <imaxon@uci.edu> wrote:
>
> > I'm in favor of merging them as well. Keeping the git repositories
> separate
> > doesn't enforce any kind of architectural separation, it just makes
> build +
> > test more complex. Nearly every major change is using the topic field
> hack
> > by this point.
> > I think the only downside is that the tests will take longer, but that
> may
> > need to be revisited anyway (in Hyracks, the index stress tests-
> especially
> > for inverted indexes- take far too long).
> >
> > Another .02¢ :)
> >
> > - Ian
> >
> > On Mon, Jun 1, 2015 at 9:46 PM, Yingyi Bu <buyingyi@gmail.com> wrote:
> >
> > > Chris,
> > >
> > > Thanks for the input!!
> > >
> > > >>1. If we're serious about Hyracks being a re-usable component of
> other
> > > products, it makes sense to dogfood that in Asterixdb. If there are
> > > problems ?>>keeping Hyracks separate from Asterix or keeping Hyracks
> with
> > > clean interfaces, this forces us to address them.
> > >
> > > In my opinion,  merging the repository doesn't break the separation of
> > > hyracks and asterixdb, because the dependencies are controlled by mvn
> pom
> > > files. We just make the code physically live together under the root
> > > directory, one is hyracks as it is and the other is asterixdb as it is.
> > > For example, Spark lives together with all the things on top of it and
> > that
> > > doesn't seem to prevent its reusability. Hadoop lives together with
> > > Hive/Pig/Zookeeper in the same repo until year 2010 when it is very
> > stable.
> > >
> > > Currently almost all my changes are spanning hyracks and asterixdb.  I
> > > believe many people also suffer from that.  Merging them together will
> > have
> > > the following benefits:
> > > 1) It forces those hyracks-only changes to pass asterixdb regression
> > > tests.  Currently hyracks-only change are not verified by asterixdb
> > tests.
> > > 2) On my local machine,  I don't need to always install hyracks and
> then
> > > verify asterixdb from time to time.  Especially, switching branches
> seems
> > > painful because the installed hyracks snapshot is overwritten from time
> > to
> > > time.
> > > 3) I only need to make one code review request and one jenkins job.
> > > Currently I need to manually change the topic of my asterixdb gerrit CL
> > > every time before I update my hyracks CL, and then manually schedule
> > > jenkins to run a new asterixdb job.  If I forget to schedule the
> jenkins
> > > job, the asterixdb CL is still shown to be "verified by jenkins".
> > >
> > > >>2. We only just recently took the initiative to take Pregelix and
> > > Hiversterix *out* of the same repository, and that was because they
> were
> > > specifically >>causing us problems as components of the same build.
> > (There
> > > were issues of competing dependency versions with Ian's YARN work, as
> > well
> > > as >>several spurious pregelix test failures, as I recall.) At a bare
> > > minimum, we cannot merge those projects back in without re-researching
> > and
> > > addressing >>those problems.
> > >
> > > Those will be definitely be fixed before Pregelix and IMRU are merged
> > > back.  Hivesterix is dead and will not be merged. I'm not proposing
> that
> > we
> > > should bring Pregelix and IMRU in now but to do that later when they
> are
> > > ready.
> > >
> > > Best,
> > > Yingyi
> > >
> > >
> > >
> > >
> > > On Mon, Jun 1, 2015 at 5:15 PM, Chris Hillery <chillery@lambda.nu>
> > wrote:
> > >
> > > > My $.02 - no, we shouldn't.
> > > >
> > > > Two main reasons:
> > > >
> > > > 1. If we're serious about Hyracks being a re-usable component of
> other
> > > > products, it makes sense to dogfood that in Asterixdb. If there are
> > > > problems keeping Hyracks separate from Asterix or keeping Hyracks
> with
> > > > clean interfaces, this forces us to address them.
> > > >
> > > > 2. We only just recently took the initiative to take Pregelix and
> > > > Hiversterix *out* of the same repository, and that was because they
> > were
> > > > specifically causing us problems as components of the same build.
> > (There
> > > > were issues of competing dependency versions with Ian's YARN work, as
> > > well
> > > > as several spurious pregelix test failures, as I recall.) At a bare
> > > > minimum, we cannot merge those projects back in without
> re-researching
> > > and
> > > > addressing those problems.
> > > >
> > > > What benefits would we gain by merging them? I honestly don't agree
> > with
> > > > Yingyi's suggestion that it would make building, bug-fixing, and code
> > > > review much simpler. At best it would help a bit on those occasions
> > when
> > > a
> > > > change spans Hyracks and Asterix, and again, IMHO that is something
> > that
> > > > *should* require additional thought and oversight. As for build and
> > test,
> > > > my feeling is that it will make it considerably harder, or at the
> very
> > > > least slower, simply due to doubling the Maven overhead.
> > > >
> > > > I do not feel that merging the projects to either fit in better with
> > > > Apache, or to game the Apache popularity indexes, is a good
> trade-off.
> > > >
> > > > Ceej
> > > > aka Chris Hillery
> > > >
> > > > On Mon, Jun 1, 2015 at 12:02 PM, Yingyi Bu <buyingyi@gmail.com>
> wrote:
> > > >
> > > >> Hi folks,
> > > >>
> > > >>     Should we merge hyracks, asterixdb, and potentially
> pregelix/imru
> > > >> into the same repository?   It will make build, fix, and code review
> > > >> process much simpler.
> > > >>     An example is that everything built on top of Spark lives in the
> > > same
> > > >> repository:  https://github.com/apache/spark.   That's also why
> Spark
> > > is
> > > >> the most active Apache project now, due to its commit frequency.
> > > >>     Does anyone have concerns for merging the hyracks and asterixdb
> > > >> repositories?
> > > >>     Thanks!
> > > >>
> > > >> Best,
> > > >> Yingyi
> > > >>
> > > >>
> > > >> On Wed, Apr 22, 2015 at 10:13 PM, Till Westmann <tillw@apache.org>
> > > wrote:
> > > >>
> > > >>> Ok, let’s find out what is the “more work” part before we
decide :)
> > > >>>
> > > >>> We should already have the SGA (as it’s part of the SGA that
Mike
> > sent
> > > >>> in) and it seemed to me that all we’re need to do “later”
(e.g.
> next
> > > >>> week/month) would be to
> > > >>> a) vote on bringing it into AsterixDB (that would be an incubator
> > vote
> > > I
> > > >>> assume) and
> > > >>> b) asking infra for another git repository.
> > > >>> So the extra work would be the vote on the incubator list.
> > > >>> Is that right or is there something else we’d need to do?
> > > >>>
> > > >>> Cheers,
> > > >>> Till
> > > >>>
> > > >>> On Apr 22, 2015, at 10:04 PM, Mattmann, Chris A (3980) <
> > > >>> chris.a.mattmann@jpl.nasa.gov> wrote:
> > > >>>
> > > >>> Hey Mike and team,
> > > >>>
> > > >>> Thanks for bringing this to the list. I think these are precisely
> > > >>> the type of conversations that we want to have here at the ASF
and
> > > >>> as part of our Incubating project. Having these discussions in
the
> > > >>> community here at the ASF (which is now the Apache AsterixDB
> > community)
> > > >>> is great.
> > > >>>
> > > >>> My opinion - it’s fine either way. I’m happy if you guys want
to
> > > >>> bring Pregelix into the code base here via AsterixDB. It’s easily
> > > >>> reversible and incremental. If you want to spin out Pregelix later
> > > >>> as its own TLP and it’s shown to have its own community we can
> > > >>> file a board resolution to do that. Heck, nothing stops us from
> > > >>> graduating 2 Incubator projects=>TLPs out of this effort even
in
> > > >>> the Incubator. That’s fine. If you want to wait and bring it
in
> > > >>> later, it will definitely be more work - so let’s call a spade
a
> > > >>> spade there. But if you want to do that that’s fine too.
> > > >>>
> > > >>> My personal recommendation - bring it in - won’t hurt and we
can
> > > >>> always pivot in the ways above later.
> > > >>>
> > > >>> Cheers,
> > > >>> Chris
> > > >>>
> > > >>>
> > > >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > > >>> Chris Mattmann, Ph.D.
> > > >>> Chief Architect
> > > >>> Instrument Software and Science Data Systems Section (398)
> > > >>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> > > >>> Office: 168-519, Mailstop: 168-527
> > > >>> Email: chris.a.mattmann@nasa.gov
> > > >>> WWW:  http://sunset.usc.edu/~mattmann/
> > > >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > > >>> Adjunct Associate Professor, Computer Science Department
> > > >>> University of Southern California, Los Angeles, CA 90089 USA
> > > >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > > >>>
> > > >>>
> > > >>>
> > > >>>
> > > >>>
> > > >>>
> > > >>> -----Original Message-----
> > > >>> From: Michael Carey <mjcarey@ics.uci.edu>
> > > >>> Date: Tuesday, April 21, 2015 at 11:49 AM
> > > >>> To: Chris Mattmann <Chris.A.Mattmann@jpl.nasa.gov>, Till
Westmann
> > > >>> <till@westmann.org>
> > > >>> Cc: Chris Hillery <chillery@lambda.nu>, Ian Maxon <imaxon@uci.edu
> >,
> > > >>> Yingyi
> > > >>> Bu <buyingyi@gmail.com>, "dev@asterixdb.incubator.apache.org"
> > > >>> <dev@asterixdb.incubator.apache.org>
> > > >>> Subject: Re: Migration of git repository
> > > >>>
> > > >>> Sure!  Let me clarify the issue for everyone (and broaden the
> > > question).
> > > >>>
> > > >>> One of the technical by-products of the AsterixDB project is a
> graph
> > > >>> analytics package called Pregelix - as the name suggests, it is
a
> > > "knock
> > > >>> off" of Pregel, as are packages like Giraph.  What's unique about
> > > >>> Pregelix is that it actually scales without OOM'ing
> > > >>> - under the covers it uses database join processing techniques.
> You
> > > can
> > > >>> find out more about it by visiting
> > > >>> http://pregelix.ics.uci.edu/ and/or by skimming the attached
> paper -
> > > >>> check out the experimental results compared to other popular
> > > >>> alternatives.  Anyway, we have made it freely available (as we
do
> all
> > > of
> > > >>> our AsterixDB-related
> > > >>> research products) and we were thinking that we should simply
> include
> > > it
> > > >>> under the AsterixDB project - kind of like Spark has subprojects
> for
> > > SQL,
> > > >>> streams, graphs, etc.  As a result, I listed it on the list of
> > > >>> transferred artifacts when I sent in the licensing
> > > >>> form the other day.  (So we at least have that step done.)  Its
> code
> > > >>> conntributors have been a small subset of the AsterixDB team;
it
> was
> > a
> > > >>> small sub-project, basically.  (Mostly just Yingyi Bu!)
> > > >>>
> > > >>> Pregelix is kind of a sibling of Apache VXQuery in that its runtime
> > is
> > > >>> based on Hyracks but it hasn't otherwise been AsterixDB-dependent.
> > > >>> However, we have just finished teaching it to read/write directly
> > from
> > > >>> AsterixDB native storage - instead of just HDFS
> > > >>> - so now it has an AsterixDB dependency, and we are using it as
a
> > > >>> driving example of how to couple AsterixDB to other analytic
> engines.
> > > >>>
> > > >>> Rather than going through another exercise to open-source this
> > > >>> separately, it seemed like we could take this approach.
> > > >>>
> > > >>> Thoughts?
> > > >>> Cheers,
> > > >>> Mike
> > > >>>
> > > >>>
> > > >>> On 4/21/15 7:45 AM, Mattmann, Chris A (3980) wrote:
> > > >>>
> > > >>>
> > > >>> Yes, in fact, this whole conversations should be happening on
> > > >>> the dev list. OK for me to CC them on my reply?
> > > >>>
> > > >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > > >>> Chris Mattmann, Ph.D.
> > > >>> Chief Architect
> > > >>> Instrument Software and Science Data Systems Section (398)
> > > >>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> > > >>> Office: 168-519, Mailstop: 168-527
> > > >>> Email: chris.a.mattmann@nasa.gov
> > > >>> WWW:  http://sunset.usc.edu/~mattmann/
> > > >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > > >>> Adjunct Associate Professor, Computer Science Department
> > > >>> University of Southern California, Los Angeles, CA 90089 USA
> > > >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > > >>>
> > > >>>
> > > >>>
> > > >>>
> > > >>>
> > > >>>
> > > >>> -----Original Message-----
> > > >>> From: "Michael J. Carey" <mjcarey@ics.uci.edu>
> > > >>> <mailto:mjcarey@ics.uci.edu <mjcarey@ics.uci.edu>>
> > > >>> Date: Tuesday, April 21, 2015 at 3:13 AM
> > > >>> To: Till Westmann <till@westmann.org> <mailto:till@westmann.org
> > > >>> <till@westmann.org>>
> > > >>> Cc: Chris Hillery <chillery@lambda.nu> <mailto:chillery@lambda.nu
> > > >>> <chillery@lambda.nu>>, Ian
> > > >>> Maxon <imaxon@uci.edu> <mailto:imaxon@uci.edu <imaxon@uci.edu>>,
> > > Yingyi
> > > >>> Bu <buyingyi@gmail.com> <mailto:buyingyi@gmail.com <
> > buyingyi@gmail.com
> > > >>,
> > > >>> Chris Mattmann
> > > >>> <Chris.A.Mattmann@jpl.nasa.gov> <mailto:
> > Chris.A.Mattmann@jpl.nasa.gov
> > > >>> <Chris.A.Mattmann@jpl.nasa.gov>>
> > > >>> Subject: Re: Migration of git repository
> > > >>>
> > > >>> + Yingyi on the Pregelix Q.  Should we also ask Chris M for advice
> on
> > > >>> that?
> > > >>> On Apr 20, 2015 4:23 PM, "Till Westmann" <till@westmann.org>
> > > >>> <mailto:till@westmann.org <till@westmann.org>> wrote:
> > > >>>
> > > >>> Hi Ian,
> > > >>>
> > > >>>
> > > >>> That’s a good question - and I don’t know the answer.
> > > >>> We’ve got 2 repos so far:
> > > >>>
> > > >>>
> > >
> >
> https://issues.apache.org/jira/browse/INFRA-9212https://issues.apache.org/
> > > >>> jira/browse/INFRA-9306
> > > >>> so we should have space for Hyracks and AsterixDB.
> > > >>>
> > > >>>
> > > >>> I think that there’s an open questions about Pregelix, but maybe
> that
> > > >>> shouldn’t keep us from going ahead.
> > > >>>
> > > >>>
> > > >>> I further think that it would be great if you could send an e-mail
> to
> > > >>> dev@asterixdb.incubator.apache.org<
> > > >>> mailto:dev@asterixdb.incubator.apache.o
> > > >>> <dev@asterixdb.incubator.apache.o>
> > > >>> rg> <mailto:dev@asterixdb.incubator.apache.org
> > > >>> <dev@asterixdb.incubator.apache.org>> and ask if it’s
ok to
> > > >>> import
> > > >>> our git repo(s) or if something else needs to be done first. (I
> could
> > > >>> send that e-mail as well, but it would be great if there were
more
> > > >>> non-Till e0mails on the list :) )
> > > >>>
> > > >>>
> > > >>> Cheers,
> > > >>> Till
> > > >>>
> > > >>>
> > > >>> On Apr 20, 2015, at 4:07 PM, Ian Maxon <imaxon@uci.edu>
> > > >>> <mailto:imaxon@uci.edu <imaxon@uci.edu>> wrote:
> > > >>>
> > > >>> Hi Mike, Chris and Till,
> > > >>>
> > > >>>
> > > >>> Since (I think?) the paperwork for the software grant is done
now,
> > > should
> > > >>> I copy our GC branches over to the ASF git repositories now (
as
> well
> > > as
> > > >>> making it a mirror in the Gerrit commit hook script)?
> > > >>>
> > > >>>
> > > >>> Thanks,
> > > >>> - Ian
> > > >>>
> > > >>>
> > > >>>
> > > >>
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message