archiva-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Graham <chrisgw...@gmail.com>
Subject Re: maven-indexer / Lucene
Date Thu, 06 Jul 2017 07:36:06 GMT
Can I please an obvious/stupid question?

What is driving this need for change?

From a quick read of the thread above, all of the options appear to introduce a lot of breaking
changes, and a whole lot more uncertainty.

So, what is so broken that it is driving these changes?

Sent from my iPhone

> On 6 Jul 2017, at 12:39 pm, Olivier Lamy <olamy@apache.org> wrote:
> 
> Yup.
> The idea is to have an extra jar produced by the maven-indexer with shaded
> lucene version.
> So the lucene classes (version used by Maven indexer) will be relocated in
> a package called org.apache.maven.index.shaded.lucene (such
> org.apache.maven.index.shaded.lucene.search.BooleanClause )
> Then you exclude lucene dependencies used by maven indexer and voila.
> The voila is a bit optimistic and not so ezy but anyway working on it ATM.
> 
> 
>> On 6 July 2017 at 07:08, Martin <martin_s@apache.org> wrote:
>> 
>> What do you mean exactly by shading? Moving to another package name?
>> 
>> Am Mittwoch, 5. Juli 2017, 01:19:17 CEST schrieb Olivier Lamy:
>>> maybe an option is to use some shading?
>>> I'm thinking of shading lucene packages used by maven indexer. I can
>> easily
>>> provide a build for that.
>>> WDYT?
>>> 
>>>> On 26 June 2017 at 11:49, Olivier Lamy <olamy@apache.org> wrote:
>>>> Hi
>>>> graph/document storage could be convenient (but not possible with
>> neo4j as
>>>> it's GPL license [1])
>>>> well we can add solr as an additional webapp with our jetty
>> distribution
>>>> but this will be a pain for users who want to use tomcat or any other
>>>> servlet container...
>>>> we still need to investigate a new storage model :-)
>>>> 
>>>> Olivier
>>>> [1] https://neo4j.com/licensing/
>>>> 
>>>>> On 25 June 2017 at 06:26, Martin <martin_s@apache.org> wrote:
>>>>> Yes, you are right. The lucene dependency causes a lot of trouble and
>>>>> will
>>>>> cause headaches with each version change of one of the dependencies.
>>>>> What are the requirements for a replacement?
>>>>> - We want to store hierarchical data?
>>>>> - We want to store metadata for nodes ?
>>>>> - Fulltext search (only metadata or for artifacts too?)
>>>>> - Blob / Artifact storage (I don't think so, but not so familiar with
>> the
>>>>> archiva artifact model)?
>>>>> 
>>>>> Maybe some graph database may be an alternative. Don't know if the
>>>>> license of
>>>>> neo4j is compatible to the apache license, and I think it brings
>> lucene
>>>>> as
>>>>> dependency too. I will have a look.
>>>>> Problem is, if there is fulltext search needed, I think, for most of
>> the
>>>>> frameworks we get a lucene dependency, if it's embedded.
>>>>> 
>>>>> Other alternatives:
>>>>> - Implement fulltext search by our own (index of the metadata stored
>> via
>>>>> the
>>>>> archiva api) and use the lucene dependency that comes from the
>>>>> maven-indexer
>>>>> - Jcr Oak with Solr. Solr is not embedded, must run as its own
>>>>> application
>>>>> (war).
>>>>> 
>>>>> Greetings
>>>>> 
>>>>> Martin
>>>>> 
>>>>> Am Samstag, 24. Juni 2017, 14:05:26 CEST schrieb Olivier Lamy:
>>>>>> well this gonna be a pain.
>>>>>> IMHO we need to find a new alternative to jcr oak.
>>>>>> And something not using Lucene as it's a real pain to have different
>>>>>> librairies using lucene as they do not update in the same time (and
>>>>> 
>>>>> Lucene
>>>>> 
>>>>>> break backward compat so quickly...)
>>>>>> Any ideas? I'd like to have something embedded (but with a possible
>>>>>> external server configuration).
>>>>>> There is currently a Cassandra implementation. I was not satisfied
>>>>>> about
>>>>>> performance but I guess I did that 4yo ago so can be improved for
>> sure
>>>>> :
>>>>> :-)
>>>>> :
>>>>>> Maybe orientdb?
>>>>>> What else?
>>>>>> 
>>>>>>> On 24 June 2017 at 09:50, Olivier Lamy <olamy@apache.org>
wrote:
>>>>>>> well the issue is non compatible version of Lucene for Maven
>> Indexer
>>>>> 
>>>>> and
>>>>> 
>>>>>>> Oak (well I can try push a patch to Oak for upgrading...)
>>>>>>> 
>>>>>>>> On 24 June 2017 at 08:41, Olivier Lamy <olamy@apache.org>
wrote:
>>>>>>>> Hi
>>>>>>>> Maven Indexer 6.0-SNAPSHOT doesn't need anymore plexus bridge.
>>>>>>>> I'm working on it in the branch ( feature/jcr_oak )
>>>>>>>> Not sure why but I have intermittent failure with store-jcr
>> module.
>>>>>>>> I definitely agree on the upgrade.
>>>>>>>> Well we can simply detect it's not oak compatible and schedule
a
>>>>>>>> full
>>>>>>>> reindex (maybe with a message in logs and ui?)
>>>>>>>> But we need to be sure we can still read central index and
not
>> sure
>>>>> 
>>>>> about
>>>>> 
>>>>>>>> possible lucene conflict with oak and maven indexer.
>>>>>>>> We can work on this branch? (I created a Jenkins job for
it
>>>>>>>> https://builds.apache.org/view/A-D/view/Archiva/job/archi
>>>>>>>> va-jcr-oak-branch/)
>>>>>>>> If you prefer master I would say no worries neither.
>>>>>>>> Something else to look at is upgrading maven-core etc...
>>>>>>>> Anyway
>>>>>>>> Cheers
>>>>>>>> Olivier
>>>>>>>> 
>>>>>>>>> On 22 June 2017 at 19:16, Martin <martin_s@apache.org>
wrote:
>>>>>>>>> Hi,
>>>>>>>>> 
>>>>>>>>> upgrading the maven indexer leads to some major changes.
>>>>>>>>> Lucene is used by maven-indexer and also by jackrabbit.
>> Jackrabbit
>>>>>>>>> sticks to
>>>>>>>>> the old 3.x version and, as I see it, they will not move
to a
>> newer
>>>>>>>>> version.
>>>>>>>>> There is Jackrabbit Oak as alternative.
>>>>>>>>> I tried a proof of concept and could replace the jackrabbit
>>>>>>>>> implementation of
>>>>>>>>> metadata-store-jcr with a oak implementation. At least
I got the
>>>>> 
>>>>> unit
>>>>> 
>>>>>>>>> tests of
>>>>>>>>> this module all to pass.
>>>>>>>>> But switching to Oak has some drawbacks:
>>>>>>>>> - The repository format changed and we must provide a
way to
>>>>>>>>> migrate
>>>>>>>>> (either
>>>>>>>>> migrate the existing repository or create a new one by
>> reindexing)
>>>>>>>>> - The lucene version used is newer but does not match
to the
>>>>>>>>> version
>>>>>>>>> from the
>>>>>>>>> maven-indexer dependencies. There may come up some
>>>>>>>>> incompatibilities
>>>>>>>>> that are
>>>>>>>>> not solvable without using a modified version of one
of the
>> both.
>>>>>>>>> Or
>>>>>>>>> there may
>>>>>>>>> be the possibility to switch to solr (as separate component)
and
>>>>> 
>>>>> get rid
>>>>> 
>>>>>>>>> of
>>>>>>>>> the lucene dependencies for jcr inside the archiva project.
>>>>>>>>> 
>>>>>>>>> Switching to maven-indexer 6.0-SNAPSHOT means some changes
too:
>>>>>>>>> - The Plexus-Sisu-Bridge does not work as before.
>>>>>>>>> - We must migrate from the NexusIndexer to the indexer
API.
>>>>>>>>> 
>>>>>>>>> So switching to the new indexer and oak means more work
as
>> expected
>>>>> 
>>>>> and
>>>>> 
>>>>>>>>> some
>>>>>>>>> risks regarding new incompatibility problems. And I think
this
>>>>> 
>>>>> cannot be
>>>>> 
>>>>>>>>> done
>>>>>>>>> without broken master builds for some time period.
>>>>>>>>> 
>>>>>>>>> So, what should we do? I think maven indexer is one of
the core
>>>>>>>>> components of
>>>>>>>>> archiva, and we should utilize the 3.x-version to  migrate
to
>> the
>>>>> 
>>>>> new
>>>>> 
>>>>>>>>> indexer
>>>>>>>>> version, even if this means switching to jcr oak. Otherwise
it
>>>>>>>>> would
>>>>>>>>> mean to
>>>>>>>>> stick to the old version for the next years.
>>>>>>>>> @Olivier, regarding the maven-indexer / sisu-Bridge API
>> changes, I
>>>>> 
>>>>> hope
>>>>> 
>>>>>>>>> you
>>>>>>>>> can provide  useful help.
>>>>>>>>> 
>>>>>>>>> I committed the PoC to the branch feature/jcr_oak. There
are
>> some
>>>>>>>>> modules
>>>>>>>>> where the tests do not pass (mainly because of the indexer
API
>>>>> 
>>>>> changes).
>>>>> 
>>>>>>>>> Any comments?
>>>>>>>>> 
>>>>>>>>> Cheers
>>>>>>>>> 
>>>>>>>>> Martin
>>>>>>>>> 
>>>>>>>>> Am Dienstag, 13. Juni 2017, 09:07:35 CEST schrieb Olivier
Lamy:
>>>>>>>>>> forget it but we need to ensure we can read maven
index
>> files....
>>>>>>>>>> 
>>>>>>>>>> On 13 June 2017 at 17:06, Olivier Lamy <olamy@apache.org>
>> wrote:
>>>>>>>>>>> Hi,
>>>>>>>>>>> Remember jackrabbit depends on Lucene as well
so upgrading
>>>>> 
>>>>> Lucene
>>>>> 
>>>>>>>>> can be a
>>>>>>>>> 
>>>>>>>>>>> problem here.
>>>>>>>>>>> Regarding maven-indexer yes we can depend on
a snapshot
>> until
>>>>> 
>>>>> the
>>>>> 
>>>>>>>>> release.
>>>>>>>>> 
>>>>>>>>>>> I can release it ;-)
>>>>>>>>>>> 
>>>>>>>>>>> On 13 June 2017 at 06:06, Martin <martin_s@apache.org>
>> wrote:
>>>>>>>>>>>> Hi,
>>>>>>>>>>>> 
>>>>>>>>>>>> the lucene version depends on the maven indexer.
But I'm
>> not
>>>>> 
>>>>> sure
>>>>> 
>>>>>>>>> about
>>>>>>>>> 
>>>>>>>>>>>> the
>>>>>>>>>>>> current state of maven-indexer. The version
has not changed
>>>>> 
>>>>> since
>>>>> 
>>>>>>>>> some
>>>>>>>>> 
>>>>>>>>>>>> 2013.
>>>>>>>>>>>> 
>>>>>>>>>>>> There are commits on the master branch since
then, and the
>>>>> 
>>>>> lucene
>>>>> 
>>>>>>>>> version
>>>>>>>>> 
>>>>>>>>>>>> has
>>>>>>>>>>>> been changed too, but no releases were tagged.
>>>>>>>>>>>> Does it make sense to switch to the maven-indexer
>>>>>>>>>>>> 6.0-SNAPSHOT?
>>>>>>>>>>>> 
>>>>>>>>>>>> As I know there are new compact index formats
with new
>> lucene
>>>>>>>>> 
>>>>>>>>> versions
>>>>>>>>> 
>>>>>>>>>>>> but I'm
>>>>>>>>>>>> not sure if this is relevant for the maven
indexes.
>>>>>>>>>>>> 
>>>>>>>>>>>> Cheers
>>>>>>>>>>>> 
>>>>>>>>>>>> Martin
>>>>>>>>>>> 
>>>>>>>>>>> --
>>>>>>>>>>> Olivier Lamy
>>>>>>>>>>> http://twitter.com/olamy | http://linkedin.com/in/olamy
>>>>>>>> 
>>>>>>>> --
>>>>>>>> Olivier Lamy
>>>>>>>> http://twitter.com/olamy | http://linkedin.com/in/olamy
>>>>>>> 
>>>>>>> --
>>>>>>> Olivier Lamy
>>>>>>> http://twitter.com/olamy | http://linkedin.com/in/olamy
>>>> 
>>>> --
>>>> Olivier Lamy
>>>> http://twitter.com/olamy | http://linkedin.com/in/olamy
>> 
>> 
>> 
> 
> 
> -- 
> Olivier Lamy
> http://twitter.com/olamy | http://linkedin.com/in/olamy

Mime
View raw message