archiva-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Olivier Lamy <ol...@apache.org>
Subject Re: maven-indexer / Lucene
Date Fri, 07 Jul 2017 07:23:24 GMT
So the repo contains a branch feature/jar_shaded_lucene here
https://git1-us-west.apache.org/repos/asf?p=maven-indexer.git;a=summary
and I pushed what I started for Archiva in the branch called feature/jcr_oak
So in order to test it you need to build first maven-indexer from the
branch feature/jar_shaded_lucene



On 6 July 2017 at 22:31, Olivier Lamy <olamy@apache.org> wrote:

> I will try to share the work I did tomorrow in a branch
>
> On Thu, 6 Jul 2017 at 7:48 pm, Martin Stockhammer <martin_s@apache.org>
> wrote:
>
>> We have different lucene (incompatible) dependencies that prevents us to
>> update the maven indexer and/or jackrabbit. And this will happen again with
>> each upgrade from one of these two packages in the future.
>> So would be really good if we can find a solution that removes one of the
>> lucene dependencies.
>>
>> Greetings
>>
>> Martin
>>
>>
>> Am 6. Juli 2017 09:36:06 MESZ schrieb Chris Graham <chrisgwarp@gmail.com
>> >:
>> >Can I please an obvious/stupid question?
>> >
>> >What is driving this need for change?
>> >
>> >From a quick read of the thread above, all of the options appear to
>> >introduce a lot of breaking changes, and a whole lot more uncertainty.
>> >
>> >So, what is so broken that it is driving these changes?
>> >
>> >Sent from my iPhone
>> >
>> >> On 6 Jul 2017, at 12:39 pm, Olivier Lamy <olamy@apache.org> wrote:
>> >>
>> >> Yup.
>> >> The idea is to have an extra jar produced by the maven-indexer with
>> >shaded
>> >> lucene version.
>> >> So the lucene classes (version used by Maven indexer) will be
>> >relocated in
>> >> a package called org.apache.maven.index.shaded.lucene (such
>> >> org.apache.maven.index.shaded.lucene.search.BooleanClause )
>> >> Then you exclude lucene dependencies used by maven indexer and voila.
>> >> The voila is a bit optimistic and not so ezy but anyway working on it
>> >ATM.
>> >>
>> >>
>> >>> On 6 July 2017 at 07:08, Martin <martin_s@apache.org> wrote:
>> >>>
>> >>> What do you mean exactly by shading? Moving to another package name?
>> >>>
>> >>> Am Mittwoch, 5. Juli 2017, 01:19:17 CEST schrieb Olivier Lamy:
>> >>>> maybe an option is to use some shading?
>> >>>> I'm thinking of shading lucene packages used by maven indexer. I
>> >can
>> >>> easily
>> >>>> provide a build for that.
>> >>>> WDYT?
>> >>>>
>> >>>>> On 26 June 2017 at 11:49, Olivier Lamy <olamy@apache.org>
wrote:
>> >>>>> Hi
>> >>>>> graph/document storage could be convenient (but not possible
with
>> >>> neo4j as
>> >>>>> it's GPL license [1])
>> >>>>> well we can add solr as an additional webapp with our jetty
>> >>> distribution
>> >>>>> but this will be a pain for users who want to use tomcat or
any
>> >other
>> >>>>> servlet container...
>> >>>>> we still need to investigate a new storage model :-)
>> >>>>>
>> >>>>> Olivier
>> >>>>> [1] https://neo4j.com/licensing/
>> >>>>>
>> >>>>>> On 25 June 2017 at 06:26, Martin <martin_s@apache.org>
wrote:
>> >>>>>> Yes, you are right. The lucene dependency causes a lot of
trouble
>> >and
>> >>>>>> will
>> >>>>>> cause headaches with each version change of one of the
>> >dependencies.
>> >>>>>> What are the requirements for a replacement?
>> >>>>>> - We want to store hierarchical data?
>> >>>>>> - We want to store metadata for nodes ?
>> >>>>>> - Fulltext search (only metadata or for artifacts too?)
>> >>>>>> - Blob / Artifact storage (I don't think so, but not so
familiar
>> >with
>> >>> the
>> >>>>>> archiva artifact model)?
>> >>>>>>
>> >>>>>> Maybe some graph database may be an alternative. Don't know
if
>> >the
>> >>>>>> license of
>> >>>>>> neo4j is compatible to the apache license, and I think it
brings
>> >>> lucene
>> >>>>>> as
>> >>>>>> dependency too. I will have a look.
>> >>>>>> Problem is, if there is fulltext search needed, I think,
for most
>> >of
>> >>> the
>> >>>>>> frameworks we get a lucene dependency, if it's embedded.
>> >>>>>>
>> >>>>>> Other alternatives:
>> >>>>>> - Implement fulltext search by our own (index of the metadata
>> >stored
>> >>> via
>> >>>>>> the
>> >>>>>> archiva api) and use the lucene dependency that comes from
the
>> >>>>>> maven-indexer
>> >>>>>> - Jcr Oak with Solr. Solr is not embedded, must run as its
own
>> >>>>>> application
>> >>>>>> (war).
>> >>>>>>
>> >>>>>> Greetings
>> >>>>>>
>> >>>>>> Martin
>> >>>>>>
>> >>>>>> Am Samstag, 24. Juni 2017, 14:05:26 CEST schrieb Olivier
Lamy:
>> >>>>>>> well this gonna be a pain.
>> >>>>>>> IMHO we need to find a new alternative to jcr oak.
>> >>>>>>> And something not using Lucene as it's a real pain to
have
>> >different
>> >>>>>>> librairies using lucene as they do not update in the
same time
>> >(and
>> >>>>>>
>> >>>>>> Lucene
>> >>>>>>
>> >>>>>>> break backward compat so quickly...)
>> >>>>>>> Any ideas? I'd like to have something embedded (but
with a
>> >possible
>> >>>>>>> external server configuration).
>> >>>>>>> There is currently a Cassandra implementation. I was
not
>> >satisfied
>> >>>>>>> about
>> >>>>>>> performance but I guess I did that 4yo ago so can be
improved
>> >for
>> >>> sure
>> >>>>>> :
>> >>>>>> :-)
>> >>>>>> :
>> >>>>>>> Maybe orientdb?
>> >>>>>>> What else?
>> >>>>>>>
>> >>>>>>>> On 24 June 2017 at 09:50, Olivier Lamy <olamy@apache.org>
>> >wrote:
>> >>>>>>>> well the issue is non compatible version of Lucene
for Maven
>> >>> Indexer
>> >>>>>>
>> >>>>>> and
>> >>>>>>
>> >>>>>>>> Oak (well I can try push a patch to Oak for upgrading...)
>> >>>>>>>>
>> >>>>>>>>> On 24 June 2017 at 08:41, Olivier Lamy <olamy@apache.org>
>> >wrote:
>> >>>>>>>>> Hi
>> >>>>>>>>> Maven Indexer 6.0-SNAPSHOT doesn't need anymore
plexus bridge.
>> >>>>>>>>> I'm working on it in the branch ( feature/jcr_oak
)
>> >>>>>>>>> Not sure why but I have intermittent failure
with store-jcr
>> >>> module.
>> >>>>>>>>> I definitely agree on the upgrade.
>> >>>>>>>>> Well we can simply detect it's not oak compatible
and schedule
>> >a
>> >>>>>>>>> full
>> >>>>>>>>> reindex (maybe with a message in logs and ui?)
>> >>>>>>>>> But we need to be sure we can still read central
index and not
>> >>> sure
>> >>>>>>
>> >>>>>> about
>> >>>>>>
>> >>>>>>>>> possible lucene conflict with oak and maven
indexer.
>> >>>>>>>>> We can work on this branch? (I created a Jenkins
job for it
>> >>>>>>>>> https://builds.apache.org/view/A-D/view/Archiva/job/archi
>> >>>>>>>>> va-jcr-oak-branch/)
>> >>>>>>>>> If you prefer master I would say no worries
neither.
>> >>>>>>>>> Something else to look at is upgrading maven-core
etc...
>> >>>>>>>>> Anyway
>> >>>>>>>>> Cheers
>> >>>>>>>>> Olivier
>> >>>>>>>>>
>> >>>>>>>>>> On 22 June 2017 at 19:16, Martin <martin_s@apache.org>
wrote:
>> >>>>>>>>>> Hi,
>> >>>>>>>>>>
>> >>>>>>>>>> upgrading the maven indexer leads to some
major changes.
>> >>>>>>>>>> Lucene is used by maven-indexer and also
by jackrabbit.
>> >>> Jackrabbit
>> >>>>>>>>>> sticks to
>> >>>>>>>>>> the old 3.x version and, as I see it, they
will not move to a
>> >>> newer
>> >>>>>>>>>> version.
>> >>>>>>>>>> There is Jackrabbit Oak as alternative.
>> >>>>>>>>>> I tried a proof of concept and could replace
the jackrabbit
>> >>>>>>>>>> implementation of
>> >>>>>>>>>> metadata-store-jcr with a oak implementation.
At least I got
>> >the
>> >>>>>>
>> >>>>>> unit
>> >>>>>>
>> >>>>>>>>>> tests of
>> >>>>>>>>>> this module all to pass.
>> >>>>>>>>>> But switching to Oak has some drawbacks:
>> >>>>>>>>>> - The repository format changed and we must
provide a way to
>> >>>>>>>>>> migrate
>> >>>>>>>>>> (either
>> >>>>>>>>>> migrate the existing repository or create
a new one by
>> >>> reindexing)
>> >>>>>>>>>> - The lucene version used is newer but does
not match to the
>> >>>>>>>>>> version
>> >>>>>>>>>> from the
>> >>>>>>>>>> maven-indexer dependencies. There may come
up some
>> >>>>>>>>>> incompatibilities
>> >>>>>>>>>> that are
>> >>>>>>>>>> not solvable without using a modified version
of one of the
>> >>> both.
>> >>>>>>>>>> Or
>> >>>>>>>>>> there may
>> >>>>>>>>>> be the possibility to switch to solr (as
separate component)
>> >and
>> >>>>>>
>> >>>>>> get rid
>> >>>>>>
>> >>>>>>>>>> of
>> >>>>>>>>>> the lucene dependencies for jcr inside the
archiva project.
>> >>>>>>>>>>
>> >>>>>>>>>> Switching to maven-indexer 6.0-SNAPSHOT
means some changes
>> >too:
>> >>>>>>>>>> - The Plexus-Sisu-Bridge does not work as
before.
>> >>>>>>>>>> - We must migrate from the NexusIndexer
to the indexer API.
>> >>>>>>>>>>
>> >>>>>>>>>> So switching to the new indexer and oak
means more work as
>> >>> expected
>> >>>>>>
>> >>>>>> and
>> >>>>>>
>> >>>>>>>>>> some
>> >>>>>>>>>> risks regarding new incompatibility problems.
And I think
>> >this
>> >>>>>>
>> >>>>>> cannot be
>> >>>>>>
>> >>>>>>>>>> done
>> >>>>>>>>>> without broken master builds for some time
period.
>> >>>>>>>>>>
>> >>>>>>>>>> So, what should we do? I think maven indexer
is one of the
>> >core
>> >>>>>>>>>> components of
>> >>>>>>>>>> archiva, and we should utilize the 3.x-version
to  migrate to
>> >>> the
>> >>>>>>
>> >>>>>> new
>> >>>>>>
>> >>>>>>>>>> indexer
>> >>>>>>>>>> version, even if this means switching to
jcr oak. Otherwise
>> >it
>> >>>>>>>>>> would
>> >>>>>>>>>> mean to
>> >>>>>>>>>> stick to the old version for the next years.
>> >>>>>>>>>> @Olivier, regarding the maven-indexer /
sisu-Bridge API
>> >>> changes, I
>> >>>>>>
>> >>>>>> hope
>> >>>>>>
>> >>>>>>>>>> you
>> >>>>>>>>>> can provide  useful help.
>> >>>>>>>>>>
>> >>>>>>>>>> I committed the PoC to the branch feature/jcr_oak.
There are
>> >>> some
>> >>>>>>>>>> modules
>> >>>>>>>>>> where the tests do not pass (mainly because
of the indexer
>> >API
>> >>>>>>
>> >>>>>> changes).
>> >>>>>>
>> >>>>>>>>>> Any comments?
>> >>>>>>>>>>
>> >>>>>>>>>> Cheers
>> >>>>>>>>>>
>> >>>>>>>>>> Martin
>> >>>>>>>>>>
>> >>>>>>>>>> Am Dienstag, 13. Juni 2017, 09:07:35 CEST
schrieb Olivier
>> >Lamy:
>> >>>>>>>>>>> forget it but we need to ensure we can
read maven index
>> >>> files....
>> >>>>>>>>>>>
>> >>>>>>>>>>> On 13 June 2017 at 17:06, Olivier Lamy
<olamy@apache.org>
>> >>> wrote:
>> >>>>>>>>>>>> Hi,
>> >>>>>>>>>>>> Remember jackrabbit depends on Lucene
as well so upgrading
>> >>>>>>
>> >>>>>> Lucene
>> >>>>>>
>> >>>>>>>>>> can be a
>> >>>>>>>>>>
>> >>>>>>>>>>>> problem here.
>> >>>>>>>>>>>> Regarding maven-indexer yes we can
depend on a snapshot
>> >>> until
>> >>>>>>
>> >>>>>> the
>> >>>>>>
>> >>>>>>>>>> release.
>> >>>>>>>>>>
>> >>>>>>>>>>>> I can release it ;-)
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> On 13 June 2017 at 06:06, Martin
<martin_s@apache.org>
>> >>> wrote:
>> >>>>>>>>>>>>> Hi,
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> the lucene version depends on
the maven indexer. But I'm
>> >>> not
>> >>>>>>
>> >>>>>> sure
>> >>>>>>
>> >>>>>>>>>> about
>> >>>>>>>>>>
>> >>>>>>>>>>>>> the
>> >>>>>>>>>>>>> current state of maven-indexer.
The version has not
>> >changed
>> >>>>>>
>> >>>>>> since
>> >>>>>>
>> >>>>>>>>>> some
>> >>>>>>>>>>
>> >>>>>>>>>>>>> 2013.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> There are commits on the master
branch since then, and the
>> >>>>>>
>> >>>>>> lucene
>> >>>>>>
>> >>>>>>>>>> version
>> >>>>>>>>>>
>> >>>>>>>>>>>>> has
>> >>>>>>>>>>>>> been changed too, but no releases
were tagged.
>> >>>>>>>>>>>>> Does it make sense to switch
to the maven-indexer
>> >>>>>>>>>>>>> 6.0-SNAPSHOT?
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> As I know there are new compact
index formats with new
>> >>> lucene
>> >>>>>>>>>>
>> >>>>>>>>>> versions
>> >>>>>>>>>>
>> >>>>>>>>>>>>> but I'm
>> >>>>>>>>>>>>> not sure if this is relevant
for the maven indexes.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> Cheers
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> Martin
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> --
>> >>>>>>>>>>>> Olivier Lamy
>> >>>>>>>>>>>> http://twitter.com/olamy | http://linkedin.com/in/olamy
>> >>>>>>>>>
>> >>>>>>>>> --
>> >>>>>>>>> Olivier Lamy
>> >>>>>>>>> http://twitter.com/olamy | http://linkedin.com/in/olamy
>> >>>>>>>>
>> >>>>>>>> --
>> >>>>>>>> Olivier Lamy
>> >>>>>>>> http://twitter.com/olamy | http://linkedin.com/in/olamy
>> >>>>>
>> >>>>> --
>> >>>>> Olivier Lamy
>> >>>>> http://twitter.com/olamy | http://linkedin.com/in/olamy
>> >>>
>> >>>
>> >>>
>> >>
>> >>
>> >> --
>> >> Olivier Lamy
>> >> http://twitter.com/olamy | http://linkedin.com/in/olamy
>>
>> --
>> Diese Nachricht wurde von meinem Android-Gerät mit K-9 Mail gesendet.
>
> --
> Olivier Lamy
> http://twitter.com/olamy | http://linkedin.com/in/olamy
>



-- 
Olivier Lamy
http://twitter.com/olamy | http://linkedin.com/in/olamy

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message