archiva-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Olivier Lamy <ol...@apache.org>
Subject Re: maven-indexer / Lucene
Date Thu, 06 Jul 2017 12:31:13 GMT
I will try to share the work I did tomorrow in a branch

On Thu, 6 Jul 2017 at 7:48 pm, Martin Stockhammer <martin_s@apache.org>
wrote:

> We have different lucene (incompatible) dependencies that prevents us to
> update the maven indexer and/or jackrabbit. And this will happen again with
> each upgrade from one of these two packages in the future.
> So would be really good if we can find a solution that removes one of the
> lucene dependencies.
>
> Greetings
>
> Martin
>
>
> Am 6. Juli 2017 09:36:06 MESZ schrieb Chris Graham <chrisgwarp@gmail.com>:
> >Can I please an obvious/stupid question?
> >
> >What is driving this need for change?
> >
> >From a quick read of the thread above, all of the options appear to
> >introduce a lot of breaking changes, and a whole lot more uncertainty.
> >
> >So, what is so broken that it is driving these changes?
> >
> >Sent from my iPhone
> >
> >> On 6 Jul 2017, at 12:39 pm, Olivier Lamy <olamy@apache.org> wrote:
> >>
> >> Yup.
> >> The idea is to have an extra jar produced by the maven-indexer with
> >shaded
> >> lucene version.
> >> So the lucene classes (version used by Maven indexer) will be
> >relocated in
> >> a package called org.apache.maven.index.shaded.lucene (such
> >> org.apache.maven.index.shaded.lucene.search.BooleanClause )
> >> Then you exclude lucene dependencies used by maven indexer and voila.
> >> The voila is a bit optimistic and not so ezy but anyway working on it
> >ATM.
> >>
> >>
> >>> On 6 July 2017 at 07:08, Martin <martin_s@apache.org> wrote:
> >>>
> >>> What do you mean exactly by shading? Moving to another package name?
> >>>
> >>> Am Mittwoch, 5. Juli 2017, 01:19:17 CEST schrieb Olivier Lamy:
> >>>> maybe an option is to use some shading?
> >>>> I'm thinking of shading lucene packages used by maven indexer. I
> >can
> >>> easily
> >>>> provide a build for that.
> >>>> WDYT?
> >>>>
> >>>>> On 26 June 2017 at 11:49, Olivier Lamy <olamy@apache.org>
wrote:
> >>>>> Hi
> >>>>> graph/document storage could be convenient (but not possible with
> >>> neo4j as
> >>>>> it's GPL license [1])
> >>>>> well we can add solr as an additional webapp with our jetty
> >>> distribution
> >>>>> but this will be a pain for users who want to use tomcat or any
> >other
> >>>>> servlet container...
> >>>>> we still need to investigate a new storage model :-)
> >>>>>
> >>>>> Olivier
> >>>>> [1] https://neo4j.com/licensing/
> >>>>>
> >>>>>> On 25 June 2017 at 06:26, Martin <martin_s@apache.org>
wrote:
> >>>>>> Yes, you are right. The lucene dependency causes a lot of trouble
> >and
> >>>>>> will
> >>>>>> cause headaches with each version change of one of the
> >dependencies.
> >>>>>> What are the requirements for a replacement?
> >>>>>> - We want to store hierarchical data?
> >>>>>> - We want to store metadata for nodes ?
> >>>>>> - Fulltext search (only metadata or for artifacts too?)
> >>>>>> - Blob / Artifact storage (I don't think so, but not so familiar
> >with
> >>> the
> >>>>>> archiva artifact model)?
> >>>>>>
> >>>>>> Maybe some graph database may be an alternative. Don't know
if
> >the
> >>>>>> license of
> >>>>>> neo4j is compatible to the apache license, and I think it brings
> >>> lucene
> >>>>>> as
> >>>>>> dependency too. I will have a look.
> >>>>>> Problem is, if there is fulltext search needed, I think, for
most
> >of
> >>> the
> >>>>>> frameworks we get a lucene dependency, if it's embedded.
> >>>>>>
> >>>>>> Other alternatives:
> >>>>>> - Implement fulltext search by our own (index of the metadata
> >stored
> >>> via
> >>>>>> the
> >>>>>> archiva api) and use the lucene dependency that comes from the
> >>>>>> maven-indexer
> >>>>>> - Jcr Oak with Solr. Solr is not embedded, must run as its own
> >>>>>> application
> >>>>>> (war).
> >>>>>>
> >>>>>> Greetings
> >>>>>>
> >>>>>> Martin
> >>>>>>
> >>>>>> Am Samstag, 24. Juni 2017, 14:05:26 CEST schrieb Olivier Lamy:
> >>>>>>> well this gonna be a pain.
> >>>>>>> IMHO we need to find a new alternative to jcr oak.
> >>>>>>> And something not using Lucene as it's a real pain to have
> >different
> >>>>>>> librairies using lucene as they do not update in the same
time
> >(and
> >>>>>>
> >>>>>> Lucene
> >>>>>>
> >>>>>>> break backward compat so quickly...)
> >>>>>>> Any ideas? I'd like to have something embedded (but with
a
> >possible
> >>>>>>> external server configuration).
> >>>>>>> There is currently a Cassandra implementation. I was not
> >satisfied
> >>>>>>> about
> >>>>>>> performance but I guess I did that 4yo ago so can be improved
> >for
> >>> sure
> >>>>>> :
> >>>>>> :-)
> >>>>>> :
> >>>>>>> Maybe orientdb?
> >>>>>>> What else?
> >>>>>>>
> >>>>>>>> On 24 June 2017 at 09:50, Olivier Lamy <olamy@apache.org>
> >wrote:
> >>>>>>>> well the issue is non compatible version of Lucene for
Maven
> >>> Indexer
> >>>>>>
> >>>>>> and
> >>>>>>
> >>>>>>>> Oak (well I can try push a patch to Oak for upgrading...)
> >>>>>>>>
> >>>>>>>>> On 24 June 2017 at 08:41, Olivier Lamy <olamy@apache.org>
> >wrote:
> >>>>>>>>> Hi
> >>>>>>>>> Maven Indexer 6.0-SNAPSHOT doesn't need anymore
plexus bridge.
> >>>>>>>>> I'm working on it in the branch ( feature/jcr_oak
)
> >>>>>>>>> Not sure why but I have intermittent failure with
store-jcr
> >>> module.
> >>>>>>>>> I definitely agree on the upgrade.
> >>>>>>>>> Well we can simply detect it's not oak compatible
and schedule
> >a
> >>>>>>>>> full
> >>>>>>>>> reindex (maybe with a message in logs and ui?)
> >>>>>>>>> But we need to be sure we can still read central
index and not
> >>> sure
> >>>>>>
> >>>>>> about
> >>>>>>
> >>>>>>>>> possible lucene conflict with oak and maven indexer.
> >>>>>>>>> We can work on this branch? (I created a Jenkins
job for it
> >>>>>>>>> https://builds.apache.org/view/A-D/view/Archiva/job/archi
> >>>>>>>>> va-jcr-oak-branch/)
> >>>>>>>>> If you prefer master I would say no worries neither.
> >>>>>>>>> Something else to look at is upgrading maven-core
etc...
> >>>>>>>>> Anyway
> >>>>>>>>> Cheers
> >>>>>>>>> Olivier
> >>>>>>>>>
> >>>>>>>>>> On 22 June 2017 at 19:16, Martin <martin_s@apache.org>
wrote:
> >>>>>>>>>> Hi,
> >>>>>>>>>>
> >>>>>>>>>> upgrading the maven indexer leads to some major
changes.
> >>>>>>>>>> Lucene is used by maven-indexer and also by
jackrabbit.
> >>> Jackrabbit
> >>>>>>>>>> sticks to
> >>>>>>>>>> the old 3.x version and, as I see it, they will
not move to a
> >>> newer
> >>>>>>>>>> version.
> >>>>>>>>>> There is Jackrabbit Oak as alternative.
> >>>>>>>>>> I tried a proof of concept and could replace
the jackrabbit
> >>>>>>>>>> implementation of
> >>>>>>>>>> metadata-store-jcr with a oak implementation.
At least I got
> >the
> >>>>>>
> >>>>>> unit
> >>>>>>
> >>>>>>>>>> tests of
> >>>>>>>>>> this module all to pass.
> >>>>>>>>>> But switching to Oak has some drawbacks:
> >>>>>>>>>> - The repository format changed and we must
provide a way to
> >>>>>>>>>> migrate
> >>>>>>>>>> (either
> >>>>>>>>>> migrate the existing repository or create a
new one by
> >>> reindexing)
> >>>>>>>>>> - The lucene version used is newer but does
not match to the
> >>>>>>>>>> version
> >>>>>>>>>> from the
> >>>>>>>>>> maven-indexer dependencies. There may come up
some
> >>>>>>>>>> incompatibilities
> >>>>>>>>>> that are
> >>>>>>>>>> not solvable without using a modified version
of one of the
> >>> both.
> >>>>>>>>>> Or
> >>>>>>>>>> there may
> >>>>>>>>>> be the possibility to switch to solr (as separate
component)
> >and
> >>>>>>
> >>>>>> get rid
> >>>>>>
> >>>>>>>>>> of
> >>>>>>>>>> the lucene dependencies for jcr inside the archiva
project.
> >>>>>>>>>>
> >>>>>>>>>> Switching to maven-indexer 6.0-SNAPSHOT means
some changes
> >too:
> >>>>>>>>>> - The Plexus-Sisu-Bridge does not work as before.
> >>>>>>>>>> - We must migrate from the NexusIndexer to the
indexer API.
> >>>>>>>>>>
> >>>>>>>>>> So switching to the new indexer and oak means
more work as
> >>> expected
> >>>>>>
> >>>>>> and
> >>>>>>
> >>>>>>>>>> some
> >>>>>>>>>> risks regarding new incompatibility problems.
And I think
> >this
> >>>>>>
> >>>>>> cannot be
> >>>>>>
> >>>>>>>>>> done
> >>>>>>>>>> without broken master builds for some time period.
> >>>>>>>>>>
> >>>>>>>>>> So, what should we do? I think maven indexer
is one of the
> >core
> >>>>>>>>>> components of
> >>>>>>>>>> archiva, and we should utilize the 3.x-version
to  migrate to
> >>> the
> >>>>>>
> >>>>>> new
> >>>>>>
> >>>>>>>>>> indexer
> >>>>>>>>>> version, even if this means switching to jcr
oak. Otherwise
> >it
> >>>>>>>>>> would
> >>>>>>>>>> mean to
> >>>>>>>>>> stick to the old version for the next years.
> >>>>>>>>>> @Olivier, regarding the maven-indexer / sisu-Bridge
API
> >>> changes, I
> >>>>>>
> >>>>>> hope
> >>>>>>
> >>>>>>>>>> you
> >>>>>>>>>> can provide  useful help.
> >>>>>>>>>>
> >>>>>>>>>> I committed the PoC to the branch feature/jcr_oak.
There are
> >>> some
> >>>>>>>>>> modules
> >>>>>>>>>> where the tests do not pass (mainly because
of the indexer
> >API
> >>>>>>
> >>>>>> changes).
> >>>>>>
> >>>>>>>>>> Any comments?
> >>>>>>>>>>
> >>>>>>>>>> Cheers
> >>>>>>>>>>
> >>>>>>>>>> Martin
> >>>>>>>>>>
> >>>>>>>>>> Am Dienstag, 13. Juni 2017, 09:07:35 CEST schrieb
Olivier
> >Lamy:
> >>>>>>>>>>> forget it but we need to ensure we can read
maven index
> >>> files....
> >>>>>>>>>>>
> >>>>>>>>>>> On 13 June 2017 at 17:06, Olivier Lamy <olamy@apache.org>
> >>> wrote:
> >>>>>>>>>>>> Hi,
> >>>>>>>>>>>> Remember jackrabbit depends on Lucene
as well so upgrading
> >>>>>>
> >>>>>> Lucene
> >>>>>>
> >>>>>>>>>> can be a
> >>>>>>>>>>
> >>>>>>>>>>>> problem here.
> >>>>>>>>>>>> Regarding maven-indexer yes we can depend
on a snapshot
> >>> until
> >>>>>>
> >>>>>> the
> >>>>>>
> >>>>>>>>>> release.
> >>>>>>>>>>
> >>>>>>>>>>>> I can release it ;-)
> >>>>>>>>>>>>
> >>>>>>>>>>>> On 13 June 2017 at 06:06, Martin <martin_s@apache.org>
> >>> wrote:
> >>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> the lucene version depends on the
maven indexer. But I'm
> >>> not
> >>>>>>
> >>>>>> sure
> >>>>>>
> >>>>>>>>>> about
> >>>>>>>>>>
> >>>>>>>>>>>>> the
> >>>>>>>>>>>>> current state of maven-indexer.
The version has not
> >changed
> >>>>>>
> >>>>>> since
> >>>>>>
> >>>>>>>>>> some
> >>>>>>>>>>
> >>>>>>>>>>>>> 2013.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> There are commits on the master
branch since then, and the
> >>>>>>
> >>>>>> lucene
> >>>>>>
> >>>>>>>>>> version
> >>>>>>>>>>
> >>>>>>>>>>>>> has
> >>>>>>>>>>>>> been changed too, but no releases
were tagged.
> >>>>>>>>>>>>> Does it make sense to switch to
the maven-indexer
> >>>>>>>>>>>>> 6.0-SNAPSHOT?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> As I know there are new compact
index formats with new
> >>> lucene
> >>>>>>>>>>
> >>>>>>>>>> versions
> >>>>>>>>>>
> >>>>>>>>>>>>> but I'm
> >>>>>>>>>>>>> not sure if this is relevant for
the maven indexes.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Cheers
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Martin
> >>>>>>>>>>>>
> >>>>>>>>>>>> --
> >>>>>>>>>>>> Olivier Lamy
> >>>>>>>>>>>> http://twitter.com/olamy | http://linkedin.com/in/olamy
> >>>>>>>>>
> >>>>>>>>> --
> >>>>>>>>> Olivier Lamy
> >>>>>>>>> http://twitter.com/olamy | http://linkedin.com/in/olamy
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>> Olivier Lamy
> >>>>>>>> http://twitter.com/olamy | http://linkedin.com/in/olamy
> >>>>>
> >>>>> --
> >>>>> Olivier Lamy
> >>>>> http://twitter.com/olamy | http://linkedin.com/in/olamy
> >>>
> >>>
> >>>
> >>
> >>
> >> --
> >> Olivier Lamy
> >> http://twitter.com/olamy | http://linkedin.com/in/olamy
>
> --
> Diese Nachricht wurde von meinem Android-Gerät mit K-9 Mail gesendet.

-- 
Olivier Lamy
http://twitter.com/olamy | http://linkedin.com/in/olamy

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message