archiva-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Martin Stockhammer <marti...@apache.org>
Subject Re: maven-indexer / Lucene
Date Thu, 06 Jul 2017 09:48:28 GMT
We have different lucene (incompatible) dependencies that prevents us to update the maven indexer
and/or jackrabbit. And this will happen again with each upgrade from one of these two packages
in the future. 
So would be really good if we can find a solution that removes one of the lucene dependencies.

Greetings

Martin


Am 6. Juli 2017 09:36:06 MESZ schrieb Chris Graham <chrisgwarp@gmail.com>:
>Can I please an obvious/stupid question?
>
>What is driving this need for change?
>
>From a quick read of the thread above, all of the options appear to
>introduce a lot of breaking changes, and a whole lot more uncertainty.
>
>So, what is so broken that it is driving these changes?
>
>Sent from my iPhone
>
>> On 6 Jul 2017, at 12:39 pm, Olivier Lamy <olamy@apache.org> wrote:
>> 
>> Yup.
>> The idea is to have an extra jar produced by the maven-indexer with
>shaded
>> lucene version.
>> So the lucene classes (version used by Maven indexer) will be
>relocated in
>> a package called org.apache.maven.index.shaded.lucene (such
>> org.apache.maven.index.shaded.lucene.search.BooleanClause )
>> Then you exclude lucene dependencies used by maven indexer and voila.
>> The voila is a bit optimistic and not so ezy but anyway working on it
>ATM.
>> 
>> 
>>> On 6 July 2017 at 07:08, Martin <martin_s@apache.org> wrote:
>>> 
>>> What do you mean exactly by shading? Moving to another package name?
>>> 
>>> Am Mittwoch, 5. Juli 2017, 01:19:17 CEST schrieb Olivier Lamy:
>>>> maybe an option is to use some shading?
>>>> I'm thinking of shading lucene packages used by maven indexer. I
>can
>>> easily
>>>> provide a build for that.
>>>> WDYT?
>>>> 
>>>>> On 26 June 2017 at 11:49, Olivier Lamy <olamy@apache.org> wrote:
>>>>> Hi
>>>>> graph/document storage could be convenient (but not possible with
>>> neo4j as
>>>>> it's GPL license [1])
>>>>> well we can add solr as an additional webapp with our jetty
>>> distribution
>>>>> but this will be a pain for users who want to use tomcat or any
>other
>>>>> servlet container...
>>>>> we still need to investigate a new storage model :-)
>>>>> 
>>>>> Olivier
>>>>> [1] https://neo4j.com/licensing/
>>>>> 
>>>>>> On 25 June 2017 at 06:26, Martin <martin_s@apache.org> wrote:
>>>>>> Yes, you are right. The lucene dependency causes a lot of trouble
>and
>>>>>> will
>>>>>> cause headaches with each version change of one of the
>dependencies.
>>>>>> What are the requirements for a replacement?
>>>>>> - We want to store hierarchical data?
>>>>>> - We want to store metadata for nodes ?
>>>>>> - Fulltext search (only metadata or for artifacts too?)
>>>>>> - Blob / Artifact storage (I don't think so, but not so familiar
>with
>>> the
>>>>>> archiva artifact model)?
>>>>>> 
>>>>>> Maybe some graph database may be an alternative. Don't know if
>the
>>>>>> license of
>>>>>> neo4j is compatible to the apache license, and I think it brings
>>> lucene
>>>>>> as
>>>>>> dependency too. I will have a look.
>>>>>> Problem is, if there is fulltext search needed, I think, for most
>of
>>> the
>>>>>> frameworks we get a lucene dependency, if it's embedded.
>>>>>> 
>>>>>> Other alternatives:
>>>>>> - Implement fulltext search by our own (index of the metadata
>stored
>>> via
>>>>>> the
>>>>>> archiva api) and use the lucene dependency that comes from the
>>>>>> maven-indexer
>>>>>> - Jcr Oak with Solr. Solr is not embedded, must run as its own
>>>>>> application
>>>>>> (war).
>>>>>> 
>>>>>> Greetings
>>>>>> 
>>>>>> Martin
>>>>>> 
>>>>>> Am Samstag, 24. Juni 2017, 14:05:26 CEST schrieb Olivier Lamy:
>>>>>>> well this gonna be a pain.
>>>>>>> IMHO we need to find a new alternative to jcr oak.
>>>>>>> And something not using Lucene as it's a real pain to have
>different
>>>>>>> librairies using lucene as they do not update in the same time
>(and
>>>>>> 
>>>>>> Lucene
>>>>>> 
>>>>>>> break backward compat so quickly...)
>>>>>>> Any ideas? I'd like to have something embedded (but with a
>possible
>>>>>>> external server configuration).
>>>>>>> There is currently a Cassandra implementation. I was not
>satisfied
>>>>>>> about
>>>>>>> performance but I guess I did that 4yo ago so can be improved
>for
>>> sure
>>>>>> :
>>>>>> :-)
>>>>>> :
>>>>>>> Maybe orientdb?
>>>>>>> What else?
>>>>>>> 
>>>>>>>> On 24 June 2017 at 09:50, Olivier Lamy <olamy@apache.org>
>wrote:
>>>>>>>> well the issue is non compatible version of Lucene for Maven
>>> Indexer
>>>>>> 
>>>>>> and
>>>>>> 
>>>>>>>> Oak (well I can try push a patch to Oak for upgrading...)
>>>>>>>> 
>>>>>>>>> On 24 June 2017 at 08:41, Olivier Lamy <olamy@apache.org>
>wrote:
>>>>>>>>> Hi
>>>>>>>>> Maven Indexer 6.0-SNAPSHOT doesn't need anymore plexus
bridge.
>>>>>>>>> I'm working on it in the branch ( feature/jcr_oak )
>>>>>>>>> Not sure why but I have intermittent failure with store-jcr
>>> module.
>>>>>>>>> I definitely agree on the upgrade.
>>>>>>>>> Well we can simply detect it's not oak compatible and
schedule
>a
>>>>>>>>> full
>>>>>>>>> reindex (maybe with a message in logs and ui?)
>>>>>>>>> But we need to be sure we can still read central index
and not
>>> sure
>>>>>> 
>>>>>> about
>>>>>> 
>>>>>>>>> possible lucene conflict with oak and maven indexer.
>>>>>>>>> We can work on this branch? (I created a Jenkins job
for it
>>>>>>>>> https://builds.apache.org/view/A-D/view/Archiva/job/archi
>>>>>>>>> va-jcr-oak-branch/)
>>>>>>>>> If you prefer master I would say no worries neither.
>>>>>>>>> Something else to look at is upgrading maven-core etc...
>>>>>>>>> Anyway
>>>>>>>>> Cheers
>>>>>>>>> Olivier
>>>>>>>>> 
>>>>>>>>>> On 22 June 2017 at 19:16, Martin <martin_s@apache.org>
wrote:
>>>>>>>>>> Hi,
>>>>>>>>>> 
>>>>>>>>>> upgrading the maven indexer leads to some major changes.
>>>>>>>>>> Lucene is used by maven-indexer and also by jackrabbit.
>>> Jackrabbit
>>>>>>>>>> sticks to
>>>>>>>>>> the old 3.x version and, as I see it, they will not
move to a
>>> newer
>>>>>>>>>> version.
>>>>>>>>>> There is Jackrabbit Oak as alternative.
>>>>>>>>>> I tried a proof of concept and could replace the
jackrabbit
>>>>>>>>>> implementation of
>>>>>>>>>> metadata-store-jcr with a oak implementation. At
least I got
>the
>>>>>> 
>>>>>> unit
>>>>>> 
>>>>>>>>>> tests of
>>>>>>>>>> this module all to pass.
>>>>>>>>>> But switching to Oak has some drawbacks:
>>>>>>>>>> - The repository format changed and we must provide
a way to
>>>>>>>>>> migrate
>>>>>>>>>> (either
>>>>>>>>>> migrate the existing repository or create a new one
by
>>> reindexing)
>>>>>>>>>> - The lucene version used is newer but does not match
to the
>>>>>>>>>> version
>>>>>>>>>> from the
>>>>>>>>>> maven-indexer dependencies. There may come up some
>>>>>>>>>> incompatibilities
>>>>>>>>>> that are
>>>>>>>>>> not solvable without using a modified version of
one of the
>>> both.
>>>>>>>>>> Or
>>>>>>>>>> there may
>>>>>>>>>> be the possibility to switch to solr (as separate
component)
>and
>>>>>> 
>>>>>> get rid
>>>>>> 
>>>>>>>>>> of
>>>>>>>>>> the lucene dependencies for jcr inside the archiva
project.
>>>>>>>>>> 
>>>>>>>>>> Switching to maven-indexer 6.0-SNAPSHOT means some
changes
>too:
>>>>>>>>>> - The Plexus-Sisu-Bridge does not work as before.
>>>>>>>>>> - We must migrate from the NexusIndexer to the indexer
API.
>>>>>>>>>> 
>>>>>>>>>> So switching to the new indexer and oak means more
work as
>>> expected
>>>>>> 
>>>>>> and
>>>>>> 
>>>>>>>>>> some
>>>>>>>>>> risks regarding new incompatibility problems. And
I think
>this
>>>>>> 
>>>>>> cannot be
>>>>>> 
>>>>>>>>>> done
>>>>>>>>>> without broken master builds for some time period.
>>>>>>>>>> 
>>>>>>>>>> So, what should we do? I think maven indexer is one
of the
>core
>>>>>>>>>> components of
>>>>>>>>>> archiva, and we should utilize the 3.x-version to
 migrate to
>>> the
>>>>>> 
>>>>>> new
>>>>>> 
>>>>>>>>>> indexer
>>>>>>>>>> version, even if this means switching to jcr oak.
Otherwise
>it
>>>>>>>>>> would
>>>>>>>>>> mean to
>>>>>>>>>> stick to the old version for the next years.
>>>>>>>>>> @Olivier, regarding the maven-indexer / sisu-Bridge
API
>>> changes, I
>>>>>> 
>>>>>> hope
>>>>>> 
>>>>>>>>>> you
>>>>>>>>>> can provide  useful help.
>>>>>>>>>> 
>>>>>>>>>> I committed the PoC to the branch feature/jcr_oak.
There are
>>> some
>>>>>>>>>> modules
>>>>>>>>>> where the tests do not pass (mainly because of the
indexer
>API
>>>>>> 
>>>>>> changes).
>>>>>> 
>>>>>>>>>> Any comments?
>>>>>>>>>> 
>>>>>>>>>> Cheers
>>>>>>>>>> 
>>>>>>>>>> Martin
>>>>>>>>>> 
>>>>>>>>>> Am Dienstag, 13. Juni 2017, 09:07:35 CEST schrieb
Olivier
>Lamy:
>>>>>>>>>>> forget it but we need to ensure we can read maven
index
>>> files....
>>>>>>>>>>> 
>>>>>>>>>>> On 13 June 2017 at 17:06, Olivier Lamy <olamy@apache.org>
>>> wrote:
>>>>>>>>>>>> Hi,
>>>>>>>>>>>> Remember jackrabbit depends on Lucene as
well so upgrading
>>>>>> 
>>>>>> Lucene
>>>>>> 
>>>>>>>>>> can be a
>>>>>>>>>> 
>>>>>>>>>>>> problem here.
>>>>>>>>>>>> Regarding maven-indexer yes we can depend
on a snapshot
>>> until
>>>>>> 
>>>>>> the
>>>>>> 
>>>>>>>>>> release.
>>>>>>>>>> 
>>>>>>>>>>>> I can release it ;-)
>>>>>>>>>>>> 
>>>>>>>>>>>> On 13 June 2017 at 06:06, Martin <martin_s@apache.org>
>>> wrote:
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> the lucene version depends on the maven
indexer. But I'm
>>> not
>>>>>> 
>>>>>> sure
>>>>>> 
>>>>>>>>>> about
>>>>>>>>>> 
>>>>>>>>>>>>> the
>>>>>>>>>>>>> current state of maven-indexer. The version
has not
>changed
>>>>>> 
>>>>>> since
>>>>>> 
>>>>>>>>>> some
>>>>>>>>>> 
>>>>>>>>>>>>> 2013.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> There are commits on the master branch
since then, and the
>>>>>> 
>>>>>> lucene
>>>>>> 
>>>>>>>>>> version
>>>>>>>>>> 
>>>>>>>>>>>>> has
>>>>>>>>>>>>> been changed too, but no releases were
tagged.
>>>>>>>>>>>>> Does it make sense to switch to the maven-indexer
>>>>>>>>>>>>> 6.0-SNAPSHOT?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> As I know there are new compact index
formats with new
>>> lucene
>>>>>>>>>> 
>>>>>>>>>> versions
>>>>>>>>>> 
>>>>>>>>>>>>> but I'm
>>>>>>>>>>>>> not sure if this is relevant for the
maven indexes.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Cheers
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Martin
>>>>>>>>>>>> 
>>>>>>>>>>>> --
>>>>>>>>>>>> Olivier Lamy
>>>>>>>>>>>> http://twitter.com/olamy | http://linkedin.com/in/olamy
>>>>>>>>> 
>>>>>>>>> --
>>>>>>>>> Olivier Lamy
>>>>>>>>> http://twitter.com/olamy | http://linkedin.com/in/olamy
>>>>>>>> 
>>>>>>>> --
>>>>>>>> Olivier Lamy
>>>>>>>> http://twitter.com/olamy | http://linkedin.com/in/olamy
>>>>> 
>>>>> --
>>>>> Olivier Lamy
>>>>> http://twitter.com/olamy | http://linkedin.com/in/olamy
>>> 
>>> 
>>> 
>> 
>> 
>> -- 
>> Olivier Lamy
>> http://twitter.com/olamy | http://linkedin.com/in/olamy

-- 
Diese Nachricht wurde von meinem Android-Gerät mit K-9 Mail gesendet.
Mime
  • Unnamed multipart/alternative (inline, 7-Bit, 0 bytes)
View raw message