archiva-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Olivier Lamy <ol...@apache.org>
Subject Re: maven-indexer / Lucene
Date Tue, 15 Aug 2017 09:30:04 GMT
Hi
Took a bit of time but I finally get the branch working :-)
branch: feature/jcr_oak
Let me know what do you think of?
Well I guess there are still some optimisations to do for jcr oak
I can see some logs:
21:02:39.559 [1071] [main] WARN  oak.query.QueryImpl - Traversal query
(query without index): SELECT * FROM [nt:base] WHERE [jcr:uuid] = $id /*
oak-internal */; consider creating an index
21:02:39.563 [328] [main] WARN  plugins.index.Cursors$TraversingCursor -
Traversed 1000 nodes with filter Filter(query=SELECT * FROM [nt:base] WHERE
[jcr:uuid] = $id /* oak-internal */, path=*,
property=[jcr:uuid=[21232f29-7a57-35a7-8389-4a0e4a801fc3]]); consider
creating an index or changing the query





On 8 July 2017 at 06:22, Martin <martin_s@apache.org> wrote:

> Hi Olivier,
>
> great!
> For my understanding: The dependency to lucene in the pom of indexer-core
> is
> still there, but the lucene packages are moved to the
> ...maven.index.shaded...
> package? You develop indexer-core with the standard lucene packages and the
> shading is executed during the build of the indexer package?
>
> I think that may solve our dependency problem.
>
> I still got errors in the maven-indexer module, but I think the status is
> still "work in progress". I don't want to interfere too much with your
> changes.
>
> I'm not sure, if we should keep the JCR Oak as metadata implementation. I
> think OrientDB may be a feasible alternative: Embeddable,  Graph database,
> Lucene index optional and may be omitted, Apache License. And with JCR Oak
> we
> also have to convert the existing metadata index.
>
> But one step after the other. If we agree that the shaded indexer works, we
> should merge only the maven indexer changes to the master branch without
> the
> JCR/lucene update and change the JCR and or lucene afterwards.
>
> Greetings
>
> Martin
>
> Am Freitag, 7. Juli 2017, 09:23:24 CEST schrieb Olivier Lamy:
> > So the repo contains a branch feature/jar_shaded_lucene here
> > https://git1-us-west.apache.org/repos/asf?p=maven-indexer.git;a=summary
> > and I pushed what I started for Archiva in the branch called
> feature/jcr_oak
> > So in order to test it you need to build first maven-indexer from the
> > branch feature/jar_shaded_lucene
> >
> > On 6 July 2017 at 22:31, Olivier Lamy <olamy@apache.org> wrote:
> > > I will try to share the work I did tomorrow in a branch
> > >
> > > On Thu, 6 Jul 2017 at 7:48 pm, Martin Stockhammer <martin_s@apache.org
> >
> > >
> > > wrote:
> > >> We have different lucene (incompatible) dependencies that prevents us
> to
> > >> update the maven indexer and/or jackrabbit. And this will happen again
> > >> with
> > >> each upgrade from one of these two packages in the future.
> > >> So would be really good if we can find a solution that removes one of
> the
> > >> lucene dependencies.
> > >>
> > >> Greetings
> > >>
> > >> Martin
> > >>
> > >>
> > >> Am 6. Juli 2017 09:36:06 MESZ schrieb Chris Graham <
> chrisgwarp@gmail.com
> > >>
> > >> >Can I please an obvious/stupid question?
> > >> >
> > >> >What is driving this need for change?
> > >> >
> > >> >From a quick read of the thread above, all of the options appear to
> > >> >introduce a lot of breaking changes, and a whole lot more
> uncertainty.
> > >> >
> > >> >So, what is so broken that it is driving these changes?
> > >> >
> > >> >Sent from my iPhone
> > >> >
> > >> >> On 6 Jul 2017, at 12:39 pm, Olivier Lamy <olamy@apache.org>
wrote:
> > >> >>
> > >> >> Yup.
> > >> >> The idea is to have an extra jar produced by the maven-indexer
with
> > >> >
> > >> >shaded
> > >> >
> > >> >> lucene version.
> > >> >> So the lucene classes (version used by Maven indexer) will be
> > >> >
> > >> >relocated in
> > >> >
> > >> >> a package called org.apache.maven.index.shaded.lucene (such
> > >> >> org.apache.maven.index.shaded.lucene.search.BooleanClause )
> > >> >> Then you exclude lucene dependencies used by maven indexer and
> voila.
> > >> >> The voila is a bit optimistic and not so ezy but anyway working
on
> it
> > >> >
> > >> >ATM.
> > >> >
> > >> >>> On 6 July 2017 at 07:08, Martin <martin_s@apache.org>
wrote:
> > >> >>>
> > >> >>> What do you mean exactly by shading? Moving to another package
> name?
> > >> >>>
> > >> >>> Am Mittwoch, 5. Juli 2017, 01:19:17 CEST schrieb Olivier Lamy:
> > >> >>>> maybe an option is to use some shading?
> > >> >>>> I'm thinking of shading lucene packages used by maven
indexer. I
> > >> >
> > >> >can
> > >> >
> > >> >>> easily
> > >> >>>
> > >> >>>> provide a build for that.
> > >> >>>> WDYT?
> > >> >>>>
> > >> >>>>> On 26 June 2017 at 11:49, Olivier Lamy <olamy@apache.org>
> wrote:
> > >> >>>>> Hi
> > >> >>>>> graph/document storage could be convenient (but not
possible
> with
> > >> >>>
> > >> >>> neo4j as
> > >> >>>
> > >> >>>>> it's GPL license [1])
> > >> >>>>> well we can add solr as an additional webapp with
our jetty
> > >> >>>
> > >> >>> distribution
> > >> >>>
> > >> >>>>> but this will be a pain for users who want to use
tomcat or any
> > >> >
> > >> >other
> > >> >
> > >> >>>>> servlet container...
> > >> >>>>> we still need to investigate a new storage model :-)
> > >> >>>>>
> > >> >>>>> Olivier
> > >> >>>>> [1] https://neo4j.com/licensing/
> > >> >>>>>
> > >> >>>>>> On 25 June 2017 at 06:26, Martin <martin_s@apache.org>
wrote:
> > >> >>>>>> Yes, you are right. The lucene dependency causes
a lot of
> trouble
> > >> >
> > >> >and
> > >> >
> > >> >>>>>> will
> > >> >>>>>> cause headaches with each version change of one
of the
> > >> >
> > >> >dependencies.
> > >> >
> > >> >>>>>> What are the requirements for a replacement?
> > >> >>>>>> - We want to store hierarchical data?
> > >> >>>>>> - We want to store metadata for nodes ?
> > >> >>>>>> - Fulltext search (only metadata or for artifacts
too?)
> > >> >>>>>> - Blob / Artifact storage (I don't think so, but
not so
> familiar
> > >> >
> > >> >with
> > >> >
> > >> >>> the
> > >> >>>
> > >> >>>>>> archiva artifact model)?
> > >> >>>>>>
> > >> >>>>>> Maybe some graph database may be an alternative.
Don't know if
> > >> >
> > >> >the
> > >> >
> > >> >>>>>> license of
> > >> >>>>>> neo4j is compatible to the apache license, and
I think it
> brings
> > >> >>>
> > >> >>> lucene
> > >> >>>
> > >> >>>>>> as
> > >> >>>>>> dependency too. I will have a look.
> > >> >>>>>> Problem is, if there is fulltext search needed,
I think, for
> most
> > >> >
> > >> >of
> > >> >
> > >> >>> the
> > >> >>>
> > >> >>>>>> frameworks we get a lucene dependency, if it's
embedded.
> > >> >>>>>>
> > >> >>>>>> Other alternatives:
> > >> >>>>>> - Implement fulltext search by our own (index
of the metadata
> > >> >
> > >> >stored
> > >> >
> > >> >>> via
> > >> >>>
> > >> >>>>>> the
> > >> >>>>>> archiva api) and use the lucene dependency that
comes from the
> > >> >>>>>> maven-indexer
> > >> >>>>>> - Jcr Oak with Solr. Solr is not embedded, must
run as its own
> > >> >>>>>> application
> > >> >>>>>> (war).
> > >> >>>>>>
> > >> >>>>>> Greetings
> > >> >>>>>>
> > >> >>>>>> Martin
> > >> >>>>>>
> > >> >>>>>> Am Samstag, 24. Juni 2017, 14:05:26 CEST schrieb
Olivier Lamy:
> > >> >>>>>>> well this gonna be a pain.
> > >> >>>>>>> IMHO we need to find a new alternative to
jcr oak.
> > >> >>>>>>> And something not using Lucene as it's a real
pain to have
> > >> >
> > >> >different
> > >> >
> > >> >>>>>>> librairies using lucene as they do not update
in the same time
> > >> >
> > >> >(and
> > >> >
> > >> >>>>>> Lucene
> > >> >>>>>>
> > >> >>>>>>> break backward compat so quickly...)
> > >> >>>>>>> Any ideas? I'd like to have something embedded
(but with a
> > >> >
> > >> >possible
> > >> >
> > >> >>>>>>> external server configuration).
> > >> >>>>>>> There is currently a Cassandra implementation.
I was not
> > >> >
> > >> >satisfied
> > >> >
> > >> >>>>>>> about
> > >> >>>>>>> performance but I guess I did that 4yo ago
so can be improved
> > >> >
> > >> >for
> > >> >
> > >> >>> sure
> > >> >>>
> > >> >>>>>> :-)
> > >> >>>>>> :
> > >> >>>>>>> Maybe orientdb?
> > >> >>>>>>> What else?
> > >> >>>>>>>
> > >> >>>>>>>> On 24 June 2017 at 09:50, Olivier Lamy
<olamy@apache.org>
> > >> >
> > >> >wrote:
> > >> >>>>>>>> well the issue is non compatible version
of Lucene for Maven
> > >> >>>
> > >> >>> Indexer
> > >> >>>
> > >> >>>>>> and
> > >> >>>>>>
> > >> >>>>>>>> Oak (well I can try push a patch to Oak
for upgrading...)
> > >> >>>>>>>>
> > >> >>>>>>>>> On 24 June 2017 at 08:41, Olivier
Lamy <olamy@apache.org>
> > >> >
> > >> >wrote:
> > >> >>>>>>>>> Hi
> > >> >>>>>>>>> Maven Indexer 6.0-SNAPSHOT doesn't
need anymore plexus
> bridge.
> > >> >>>>>>>>> I'm working on it in the branch (
feature/jcr_oak )
> > >> >>>>>>>>> Not sure why but I have intermittent
failure with store-jcr
> > >> >>>
> > >> >>> module.
> > >> >>>
> > >> >>>>>>>>> I definitely agree on the upgrade.
> > >> >>>>>>>>> Well we can simply detect it's not
oak compatible and
> schedule
> > >> >
> > >> >a
> > >> >
> > >> >>>>>>>>> full
> > >> >>>>>>>>> reindex (maybe with a message in logs
and ui?)
> > >> >>>>>>>>> But we need to be sure we can still
read central index and
> not
> > >> >>>
> > >> >>> sure
> > >> >>>
> > >> >>>>>> about
> > >> >>>>>>
> > >> >>>>>>>>> possible lucene conflict with oak
and maven indexer.
> > >> >>>>>>>>> We can work on this branch? (I created
a Jenkins job for it
> > >> >>>>>>>>> https://builds.apache.org/view/A-D/view/Archiva/job/archi
> > >> >>>>>>>>> va-jcr-oak-branch/)
> > >> >>>>>>>>> If you prefer master I would say no
worries neither.
> > >> >>>>>>>>> Something else to look at is upgrading
maven-core etc...
> > >> >>>>>>>>> Anyway
> > >> >>>>>>>>> Cheers
> > >> >>>>>>>>> Olivier
> > >> >>>>>>>>>
> > >> >>>>>>>>>> On 22 June 2017 at 19:16, Martin
<martin_s@apache.org>
> wrote:
> > >> >>>>>>>>>> Hi,
> > >> >>>>>>>>>>
> > >> >>>>>>>>>> upgrading the maven indexer leads
to some major changes.
> > >> >>>>>>>>>> Lucene is used by maven-indexer
and also by jackrabbit.
> > >> >>>
> > >> >>> Jackrabbit
> > >> >>>
> > >> >>>>>>>>>> sticks to
> > >> >>>>>>>>>> the old 3.x version and, as I
see it, they will not move
> to a
> > >> >>>
> > >> >>> newer
> > >> >>>
> > >> >>>>>>>>>> version.
> > >> >>>>>>>>>> There is Jackrabbit Oak as alternative.
> > >> >>>>>>>>>> I tried a proof of concept and
could replace the jackrabbit
> > >> >>>>>>>>>> implementation of
> > >> >>>>>>>>>> metadata-store-jcr with a oak
implementation. At least I
> got
> > >> >
> > >> >the
> > >> >
> > >> >>>>>> unit
> > >> >>>>>>
> > >> >>>>>>>>>> tests of
> > >> >>>>>>>>>> this module all to pass.
> > >> >>>>>>>>>> But switching to Oak has some
drawbacks:
> > >> >>>>>>>>>> - The repository format changed
and we must provide a way
> to
> > >> >>>>>>>>>> migrate
> > >> >>>>>>>>>> (either
> > >> >>>>>>>>>> migrate the existing repository
or create a new one by
> > >> >>>
> > >> >>> reindexing)
> > >> >>>
> > >> >>>>>>>>>> - The lucene version used is newer
but does not match to
> the
> > >> >>>>>>>>>> version
> > >> >>>>>>>>>> from the
> > >> >>>>>>>>>> maven-indexer dependencies. There
may come up some
> > >> >>>>>>>>>> incompatibilities
> > >> >>>>>>>>>> that are
> > >> >>>>>>>>>> not solvable without using a modified
version of one of the
> > >> >>>
> > >> >>> both.
> > >> >>>
> > >> >>>>>>>>>> Or
> > >> >>>>>>>>>> there may
> > >> >>>>>>>>>> be the possibility to switch to
solr (as separate
> component)
> > >> >
> > >> >and
> > >> >
> > >> >>>>>> get rid
> > >> >>>>>>
> > >> >>>>>>>>>> of
> > >> >>>>>>>>>> the lucene dependencies for jcr
inside the archiva project.
> > >> >>>>>>>>>>
> > >> >>>>>>>>>> Switching to maven-indexer 6.0-SNAPSHOT
means some changes
> > >> >
> > >> >too:
> > >> >>>>>>>>>> - The Plexus-Sisu-Bridge does
not work as before.
> > >> >>>>>>>>>> - We must migrate from the NexusIndexer
to the indexer API.
> > >> >>>>>>>>>>
> > >> >>>>>>>>>> So switching to the new indexer
and oak means more work as
> > >> >>>
> > >> >>> expected
> > >> >>>
> > >> >>>>>> and
> > >> >>>>>>
> > >> >>>>>>>>>> some
> > >> >>>>>>>>>> risks regarding new incompatibility
problems. And I think
> > >> >
> > >> >this
> > >> >
> > >> >>>>>> cannot be
> > >> >>>>>>
> > >> >>>>>>>>>> done
> > >> >>>>>>>>>> without broken master builds for
some time period.
> > >> >>>>>>>>>>
> > >> >>>>>>>>>> So, what should we do? I think
maven indexer is one of the
> > >> >
> > >> >core
> > >> >
> > >> >>>>>>>>>> components of
> > >> >>>>>>>>>> archiva, and we should utilize
the 3.x-version to  migrate
> to
> > >> >>>
> > >> >>> the
> > >> >>>
> > >> >>>>>> new
> > >> >>>>>>
> > >> >>>>>>>>>> indexer
> > >> >>>>>>>>>> version, even if this means switching
to jcr oak. Otherwise
> > >> >
> > >> >it
> > >> >
> > >> >>>>>>>>>> would
> > >> >>>>>>>>>> mean to
> > >> >>>>>>>>>> stick to the old version for the
next years.
> > >> >>>>>>>>>> @Olivier, regarding the maven-indexer
/ sisu-Bridge API
> > >> >>>
> > >> >>> changes, I
> > >> >>>
> > >> >>>>>> hope
> > >> >>>>>>
> > >> >>>>>>>>>> you
> > >> >>>>>>>>>> can provide  useful help.
> > >> >>>>>>>>>>
> > >> >>>>>>>>>> I committed the PoC to the branch
feature/jcr_oak. There
> are
> > >> >>>
> > >> >>> some
> > >> >>>
> > >> >>>>>>>>>> modules
> > >> >>>>>>>>>> where the tests do not pass (mainly
because of the indexer
> > >> >
> > >> >API
> > >> >
> > >> >>>>>> changes).
> > >> >>>>>>
> > >> >>>>>>>>>> Any comments?
> > >> >>>>>>>>>>
> > >> >>>>>>>>>> Cheers
> > >> >>>>>>>>>>
> > >> >>>>>>>>>> Martin
> > >> >>>>>>>>>>
> > >> >>>>>>>>>> Am Dienstag, 13. Juni 2017, 09:07:35
CEST schrieb Olivier
> > >> >
> > >> >Lamy:
> > >> >>>>>>>>>>> forget it but we need to ensure
we can read maven index
> > >> >>>
> > >> >>> files....
> > >> >>>
> > >> >>>>>>>>>>> On 13 June 2017 at 17:06,
Olivier Lamy <olamy@apache.org>
> > >> >>>
> > >> >>> wrote:
> > >> >>>>>>>>>>>> Hi,
> > >> >>>>>>>>>>>> Remember jackrabbit depends
on Lucene as well so
> upgrading
> > >> >>>>>>
> > >> >>>>>> Lucene
> > >> >>>>>>
> > >> >>>>>>>>>> can be a
> > >> >>>>>>>>>>
> > >> >>>>>>>>>>>> problem here.
> > >> >>>>>>>>>>>> Regarding maven-indexer
yes we can depend on a snapshot
> > >> >>>
> > >> >>> until
> > >> >>>
> > >> >>>>>> the
> > >> >>>>>>
> > >> >>>>>>>>>> release.
> > >> >>>>>>>>>>
> > >> >>>>>>>>>>>> I can release it ;-)
> > >> >>>>>>>>>>>>
> > >> >>>>>>>>>>>> On 13 June 2017 at 06:06,
Martin <martin_s@apache.org>
> > >> >>>
> > >> >>> wrote:
> > >> >>>>>>>>>>>>> Hi,
> > >> >>>>>>>>>>>>>
> > >> >>>>>>>>>>>>> the lucene version
depends on the maven indexer. But I'm
> > >> >>>
> > >> >>> not
> > >> >>>
> > >> >>>>>> sure
> > >> >>>>>>
> > >> >>>>>>>>>> about
> > >> >>>>>>>>>>
> > >> >>>>>>>>>>>>> the
> > >> >>>>>>>>>>>>> current state of maven-indexer.
The version has not
> > >> >
> > >> >changed
> > >> >
> > >> >>>>>> since
> > >> >>>>>>
> > >> >>>>>>>>>> some
> > >> >>>>>>>>>>
> > >> >>>>>>>>>>>>> 2013.
> > >> >>>>>>>>>>>>>
> > >> >>>>>>>>>>>>> There are commits
on the master branch since then, and
> the
> > >> >>>>>>
> > >> >>>>>> lucene
> > >> >>>>>>
> > >> >>>>>>>>>> version
> > >> >>>>>>>>>>
> > >> >>>>>>>>>>>>> has
> > >> >>>>>>>>>>>>> been changed too,
but no releases were tagged.
> > >> >>>>>>>>>>>>> Does it make sense
to switch to the maven-indexer
> > >> >>>>>>>>>>>>> 6.0-SNAPSHOT?
> > >> >>>>>>>>>>>>>
> > >> >>>>>>>>>>>>> As I know there are
new compact index formats with new
> > >> >>>
> > >> >>> lucene
> > >> >>>
> > >> >>>>>>>>>> versions
> > >> >>>>>>>>>>
> > >> >>>>>>>>>>>>> but I'm
> > >> >>>>>>>>>>>>> not sure if this is
relevant for the maven indexes.
> > >> >>>>>>>>>>>>>
> > >> >>>>>>>>>>>>> Cheers
> > >> >>>>>>>>>>>>>
> > >> >>>>>>>>>>>>> Martin
> > >> >>>>>>>>>>>>
> > >> >>>>>>>>>>>> --
> > >> >>>>>>>>>>>> Olivier Lamy
> > >> >>>>>>>>>>>> http://twitter.com/olamy
| http://linkedin.com/in/olamy
> > >> >>>>>>>>>
> > >> >>>>>>>>> --
> > >> >>>>>>>>> Olivier Lamy
> > >> >>>>>>>>> http://twitter.com/olamy | http://linkedin.com/in/olamy
> > >> >>>>>>>>
> > >> >>>>>>>> --
> > >> >>>>>>>> Olivier Lamy
> > >> >>>>>>>> http://twitter.com/olamy | http://linkedin.com/in/olamy
> > >> >>>>>
> > >> >>>>> --
> > >> >>>>> Olivier Lamy
> > >> >>>>> http://twitter.com/olamy | http://linkedin.com/in/olamy
> > >> >>
> > >> >> --
> > >> >> Olivier Lamy
> > >> >> http://twitter.com/olamy | http://linkedin.com/in/olamy
> > >>
> > >> --
> > >> Diese Nachricht wurde von meinem Android-Gerät mit K-9 Mail gesendet.
> > >
> > > --
> > > Olivier Lamy
> > > http://twitter.com/olamy | http://linkedin.com/in/olamy
>
>
>


-- 
Olivier Lamy
http://twitter.com/olamy | http://linkedin.com/in/olamy

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message