archiva-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Martin <marti...@apache.org>
Subject Re: maven-indexer / Lucene
Date Fri, 07 Jul 2017 20:22:44 GMT
Hi Olivier,

great! 
For my understanding: The dependency to lucene in the pom of indexer-core is 
still there, but the lucene packages are moved to the ...maven.index.shaded... 
package? You develop indexer-core with the standard lucene packages and the 
shading is executed during the build of the indexer package?

I think that may solve our dependency problem.

I still got errors in the maven-indexer module, but I think the status is 
still "work in progress". I don't want to interfere too much with your 
changes.

I'm not sure, if we should keep the JCR Oak as metadata implementation. I 
think OrientDB may be a feasible alternative: Embeddable,  Graph database, 
Lucene index optional and may be omitted, Apache License. And with JCR Oak we 
also have to convert the existing metadata index.

But one step after the other. If we agree that the shaded indexer works, we 
should merge only the maven indexer changes to the master branch without the 
JCR/lucene update and change the JCR and or lucene afterwards.

Greetings

Martin

Am Freitag, 7. Juli 2017, 09:23:24 CEST schrieb Olivier Lamy:
> So the repo contains a branch feature/jar_shaded_lucene here
> https://git1-us-west.apache.org/repos/asf?p=maven-indexer.git;a=summary
> and I pushed what I started for Archiva in the branch called feature/jcr_oak
> So in order to test it you need to build first maven-indexer from the
> branch feature/jar_shaded_lucene
> 
> On 6 July 2017 at 22:31, Olivier Lamy <olamy@apache.org> wrote:
> > I will try to share the work I did tomorrow in a branch
> > 
> > On Thu, 6 Jul 2017 at 7:48 pm, Martin Stockhammer <martin_s@apache.org>
> > 
> > wrote:
> >> We have different lucene (incompatible) dependencies that prevents us to
> >> update the maven indexer and/or jackrabbit. And this will happen again
> >> with
> >> each upgrade from one of these two packages in the future.
> >> So would be really good if we can find a solution that removes one of the
> >> lucene dependencies.
> >> 
> >> Greetings
> >> 
> >> Martin
> >> 
> >> 
> >> Am 6. Juli 2017 09:36:06 MESZ schrieb Chris Graham <chrisgwarp@gmail.com
> >> 
> >> >Can I please an obvious/stupid question?
> >> >
> >> >What is driving this need for change?
> >> >
> >> >From a quick read of the thread above, all of the options appear to
> >> >introduce a lot of breaking changes, and a whole lot more uncertainty.
> >> >
> >> >So, what is so broken that it is driving these changes?
> >> >
> >> >Sent from my iPhone
> >> >
> >> >> On 6 Jul 2017, at 12:39 pm, Olivier Lamy <olamy@apache.org> wrote:
> >> >> 
> >> >> Yup.
> >> >> The idea is to have an extra jar produced by the maven-indexer with
> >> >
> >> >shaded
> >> >
> >> >> lucene version.
> >> >> So the lucene classes (version used by Maven indexer) will be
> >> >
> >> >relocated in
> >> >
> >> >> a package called org.apache.maven.index.shaded.lucene (such
> >> >> org.apache.maven.index.shaded.lucene.search.BooleanClause )
> >> >> Then you exclude lucene dependencies used by maven indexer and voila.
> >> >> The voila is a bit optimistic and not so ezy but anyway working on
it
> >> >
> >> >ATM.
> >> >
> >> >>> On 6 July 2017 at 07:08, Martin <martin_s@apache.org> wrote:
> >> >>> 
> >> >>> What do you mean exactly by shading? Moving to another package
name?
> >> >>> 
> >> >>> Am Mittwoch, 5. Juli 2017, 01:19:17 CEST schrieb Olivier Lamy:
> >> >>>> maybe an option is to use some shading?
> >> >>>> I'm thinking of shading lucene packages used by maven indexer.
I
> >> >
> >> >can
> >> >
> >> >>> easily
> >> >>> 
> >> >>>> provide a build for that.
> >> >>>> WDYT?
> >> >>>> 
> >> >>>>> On 26 June 2017 at 11:49, Olivier Lamy <olamy@apache.org>
wrote:
> >> >>>>> Hi
> >> >>>>> graph/document storage could be convenient (but not possible
with
> >> >>> 
> >> >>> neo4j as
> >> >>> 
> >> >>>>> it's GPL license [1])
> >> >>>>> well we can add solr as an additional webapp with our jetty
> >> >>> 
> >> >>> distribution
> >> >>> 
> >> >>>>> but this will be a pain for users who want to use tomcat
or any
> >> >
> >> >other
> >> >
> >> >>>>> servlet container...
> >> >>>>> we still need to investigate a new storage model :-)
> >> >>>>> 
> >> >>>>> Olivier
> >> >>>>> [1] https://neo4j.com/licensing/
> >> >>>>> 
> >> >>>>>> On 25 June 2017 at 06:26, Martin <martin_s@apache.org>
wrote:
> >> >>>>>> Yes, you are right. The lucene dependency causes a
lot of trouble
> >> >
> >> >and
> >> >
> >> >>>>>> will
> >> >>>>>> cause headaches with each version change of one of
the
> >> >
> >> >dependencies.
> >> >
> >> >>>>>> What are the requirements for a replacement?
> >> >>>>>> - We want to store hierarchical data?
> >> >>>>>> - We want to store metadata for nodes ?
> >> >>>>>> - Fulltext search (only metadata or for artifacts too?)
> >> >>>>>> - Blob / Artifact storage (I don't think so, but not
so familiar
> >> >
> >> >with
> >> >
> >> >>> the
> >> >>> 
> >> >>>>>> archiva artifact model)?
> >> >>>>>> 
> >> >>>>>> Maybe some graph database may be an alternative. Don't
know if
> >> >
> >> >the
> >> >
> >> >>>>>> license of
> >> >>>>>> neo4j is compatible to the apache license, and I think
it brings
> >> >>> 
> >> >>> lucene
> >> >>> 
> >> >>>>>> as
> >> >>>>>> dependency too. I will have a look.
> >> >>>>>> Problem is, if there is fulltext search needed, I think,
for most
> >> >
> >> >of
> >> >
> >> >>> the
> >> >>> 
> >> >>>>>> frameworks we get a lucene dependency, if it's embedded.
> >> >>>>>> 
> >> >>>>>> Other alternatives:
> >> >>>>>> - Implement fulltext search by our own (index of the
metadata
> >> >
> >> >stored
> >> >
> >> >>> via
> >> >>> 
> >> >>>>>> the
> >> >>>>>> archiva api) and use the lucene dependency that comes
from the
> >> >>>>>> maven-indexer
> >> >>>>>> - Jcr Oak with Solr. Solr is not embedded, must run
as its own
> >> >>>>>> application
> >> >>>>>> (war).
> >> >>>>>> 
> >> >>>>>> Greetings
> >> >>>>>> 
> >> >>>>>> Martin
> >> >>>>>> 
> >> >>>>>> Am Samstag, 24. Juni 2017, 14:05:26 CEST schrieb Olivier
Lamy:
> >> >>>>>>> well this gonna be a pain.
> >> >>>>>>> IMHO we need to find a new alternative to jcr oak.
> >> >>>>>>> And something not using Lucene as it's a real pain
to have
> >> >
> >> >different
> >> >
> >> >>>>>>> librairies using lucene as they do not update in
the same time
> >> >
> >> >(and
> >> >
> >> >>>>>> Lucene
> >> >>>>>> 
> >> >>>>>>> break backward compat so quickly...)
> >> >>>>>>> Any ideas? I'd like to have something embedded
(but with a
> >> >
> >> >possible
> >> >
> >> >>>>>>> external server configuration).
> >> >>>>>>> There is currently a Cassandra implementation.
I was not
> >> >
> >> >satisfied
> >> >
> >> >>>>>>> about
> >> >>>>>>> performance but I guess I did that 4yo ago so can
be improved
> >> >
> >> >for
> >> >
> >> >>> sure
> >> >>> 
> >> >>>>>> :-)
> >> >>>>>> :
> >> >>>>>>> Maybe orientdb?
> >> >>>>>>> What else?
> >> >>>>>>> 
> >> >>>>>>>> On 24 June 2017 at 09:50, Olivier Lamy <olamy@apache.org>
> >> >
> >> >wrote:
> >> >>>>>>>> well the issue is non compatible version of
Lucene for Maven
> >> >>> 
> >> >>> Indexer
> >> >>> 
> >> >>>>>> and
> >> >>>>>> 
> >> >>>>>>>> Oak (well I can try push a patch to Oak for
upgrading...)
> >> >>>>>>>> 
> >> >>>>>>>>> On 24 June 2017 at 08:41, Olivier Lamy
<olamy@apache.org>
> >> >
> >> >wrote:
> >> >>>>>>>>> Hi
> >> >>>>>>>>> Maven Indexer 6.0-SNAPSHOT doesn't need
anymore plexus bridge.
> >> >>>>>>>>> I'm working on it in the branch ( feature/jcr_oak
)
> >> >>>>>>>>> Not sure why but I have intermittent failure
with store-jcr
> >> >>> 
> >> >>> module.
> >> >>> 
> >> >>>>>>>>> I definitely agree on the upgrade.
> >> >>>>>>>>> Well we can simply detect it's not oak
compatible and schedule
> >> >
> >> >a
> >> >
> >> >>>>>>>>> full
> >> >>>>>>>>> reindex (maybe with a message in logs and
ui?)
> >> >>>>>>>>> But we need to be sure we can still read
central index and not
> >> >>> 
> >> >>> sure
> >> >>> 
> >> >>>>>> about
> >> >>>>>> 
> >> >>>>>>>>> possible lucene conflict with oak and maven
indexer.
> >> >>>>>>>>> We can work on this branch? (I created
a Jenkins job for it
> >> >>>>>>>>> https://builds.apache.org/view/A-D/view/Archiva/job/archi
> >> >>>>>>>>> va-jcr-oak-branch/)
> >> >>>>>>>>> If you prefer master I would say no worries
neither.
> >> >>>>>>>>> Something else to look at is upgrading
maven-core etc...
> >> >>>>>>>>> Anyway
> >> >>>>>>>>> Cheers
> >> >>>>>>>>> Olivier
> >> >>>>>>>>> 
> >> >>>>>>>>>> On 22 June 2017 at 19:16, Martin <martin_s@apache.org>
wrote:
> >> >>>>>>>>>> Hi,
> >> >>>>>>>>>> 
> >> >>>>>>>>>> upgrading the maven indexer leads to
some major changes.
> >> >>>>>>>>>> Lucene is used by maven-indexer and
also by jackrabbit.
> >> >>> 
> >> >>> Jackrabbit
> >> >>> 
> >> >>>>>>>>>> sticks to
> >> >>>>>>>>>> the old 3.x version and, as I see it,
they will not move to a
> >> >>> 
> >> >>> newer
> >> >>> 
> >> >>>>>>>>>> version.
> >> >>>>>>>>>> There is Jackrabbit Oak as alternative.
> >> >>>>>>>>>> I tried a proof of concept and could
replace the jackrabbit
> >> >>>>>>>>>> implementation of
> >> >>>>>>>>>> metadata-store-jcr with a oak implementation.
At least I got
> >> >
> >> >the
> >> >
> >> >>>>>> unit
> >> >>>>>> 
> >> >>>>>>>>>> tests of
> >> >>>>>>>>>> this module all to pass.
> >> >>>>>>>>>> But switching to Oak has some drawbacks:
> >> >>>>>>>>>> - The repository format changed and
we must provide a way to
> >> >>>>>>>>>> migrate
> >> >>>>>>>>>> (either
> >> >>>>>>>>>> migrate the existing repository or
create a new one by
> >> >>> 
> >> >>> reindexing)
> >> >>> 
> >> >>>>>>>>>> - The lucene version used is newer
but does not match to the
> >> >>>>>>>>>> version
> >> >>>>>>>>>> from the
> >> >>>>>>>>>> maven-indexer dependencies. There may
come up some
> >> >>>>>>>>>> incompatibilities
> >> >>>>>>>>>> that are
> >> >>>>>>>>>> not solvable without using a modified
version of one of the
> >> >>> 
> >> >>> both.
> >> >>> 
> >> >>>>>>>>>> Or
> >> >>>>>>>>>> there may
> >> >>>>>>>>>> be the possibility to switch to solr
(as separate component)
> >> >
> >> >and
> >> >
> >> >>>>>> get rid
> >> >>>>>> 
> >> >>>>>>>>>> of
> >> >>>>>>>>>> the lucene dependencies for jcr inside
the archiva project.
> >> >>>>>>>>>> 
> >> >>>>>>>>>> Switching to maven-indexer 6.0-SNAPSHOT
means some changes
> >> >
> >> >too:
> >> >>>>>>>>>> - The Plexus-Sisu-Bridge does not work
as before.
> >> >>>>>>>>>> - We must migrate from the NexusIndexer
to the indexer API.
> >> >>>>>>>>>> 
> >> >>>>>>>>>> So switching to the new indexer and
oak means more work as
> >> >>> 
> >> >>> expected
> >> >>> 
> >> >>>>>> and
> >> >>>>>> 
> >> >>>>>>>>>> some
> >> >>>>>>>>>> risks regarding new incompatibility
problems. And I think
> >> >
> >> >this
> >> >
> >> >>>>>> cannot be
> >> >>>>>> 
> >> >>>>>>>>>> done
> >> >>>>>>>>>> without broken master builds for some
time period.
> >> >>>>>>>>>> 
> >> >>>>>>>>>> So, what should we do? I think maven
indexer is one of the
> >> >
> >> >core
> >> >
> >> >>>>>>>>>> components of
> >> >>>>>>>>>> archiva, and we should utilize the
3.x-version to  migrate to
> >> >>> 
> >> >>> the
> >> >>> 
> >> >>>>>> new
> >> >>>>>> 
> >> >>>>>>>>>> indexer
> >> >>>>>>>>>> version, even if this means switching
to jcr oak. Otherwise
> >> >
> >> >it
> >> >
> >> >>>>>>>>>> would
> >> >>>>>>>>>> mean to
> >> >>>>>>>>>> stick to the old version for the next
years.
> >> >>>>>>>>>> @Olivier, regarding the maven-indexer
/ sisu-Bridge API
> >> >>> 
> >> >>> changes, I
> >> >>> 
> >> >>>>>> hope
> >> >>>>>> 
> >> >>>>>>>>>> you
> >> >>>>>>>>>> can provide  useful help.
> >> >>>>>>>>>> 
> >> >>>>>>>>>> I committed the PoC to the branch feature/jcr_oak.
There are
> >> >>> 
> >> >>> some
> >> >>> 
> >> >>>>>>>>>> modules
> >> >>>>>>>>>> where the tests do not pass (mainly
because of the indexer
> >> >
> >> >API
> >> >
> >> >>>>>> changes).
> >> >>>>>> 
> >> >>>>>>>>>> Any comments?
> >> >>>>>>>>>> 
> >> >>>>>>>>>> Cheers
> >> >>>>>>>>>> 
> >> >>>>>>>>>> Martin
> >> >>>>>>>>>> 
> >> >>>>>>>>>> Am Dienstag, 13. Juni 2017, 09:07:35
CEST schrieb Olivier
> >> >
> >> >Lamy:
> >> >>>>>>>>>>> forget it but we need to ensure
we can read maven index
> >> >>> 
> >> >>> files....
> >> >>> 
> >> >>>>>>>>>>> On 13 June 2017 at 17:06, Olivier
Lamy <olamy@apache.org>
> >> >>> 
> >> >>> wrote:
> >> >>>>>>>>>>>> Hi,
> >> >>>>>>>>>>>> Remember jackrabbit depends
on Lucene as well so upgrading
> >> >>>>>> 
> >> >>>>>> Lucene
> >> >>>>>> 
> >> >>>>>>>>>> can be a
> >> >>>>>>>>>> 
> >> >>>>>>>>>>>> problem here.
> >> >>>>>>>>>>>> Regarding maven-indexer yes
we can depend on a snapshot
> >> >>> 
> >> >>> until
> >> >>> 
> >> >>>>>> the
> >> >>>>>> 
> >> >>>>>>>>>> release.
> >> >>>>>>>>>> 
> >> >>>>>>>>>>>> I can release it ;-)
> >> >>>>>>>>>>>> 
> >> >>>>>>>>>>>> On 13 June 2017 at 06:06, Martin
<martin_s@apache.org>
> >> >>> 
> >> >>> wrote:
> >> >>>>>>>>>>>>> Hi,
> >> >>>>>>>>>>>>> 
> >> >>>>>>>>>>>>> the lucene version depends
on the maven indexer. But I'm
> >> >>> 
> >> >>> not
> >> >>> 
> >> >>>>>> sure
> >> >>>>>> 
> >> >>>>>>>>>> about
> >> >>>>>>>>>> 
> >> >>>>>>>>>>>>> the
> >> >>>>>>>>>>>>> current state of maven-indexer.
The version has not
> >> >
> >> >changed
> >> >
> >> >>>>>> since
> >> >>>>>> 
> >> >>>>>>>>>> some
> >> >>>>>>>>>> 
> >> >>>>>>>>>>>>> 2013.
> >> >>>>>>>>>>>>> 
> >> >>>>>>>>>>>>> There are commits on the
master branch since then, and the
> >> >>>>>> 
> >> >>>>>> lucene
> >> >>>>>> 
> >> >>>>>>>>>> version
> >> >>>>>>>>>> 
> >> >>>>>>>>>>>>> has
> >> >>>>>>>>>>>>> been changed too, but no
releases were tagged.
> >> >>>>>>>>>>>>> Does it make sense to switch
to the maven-indexer
> >> >>>>>>>>>>>>> 6.0-SNAPSHOT?
> >> >>>>>>>>>>>>> 
> >> >>>>>>>>>>>>> As I know there are new
compact index formats with new
> >> >>> 
> >> >>> lucene
> >> >>> 
> >> >>>>>>>>>> versions
> >> >>>>>>>>>> 
> >> >>>>>>>>>>>>> but I'm
> >> >>>>>>>>>>>>> not sure if this is relevant
for the maven indexes.
> >> >>>>>>>>>>>>> 
> >> >>>>>>>>>>>>> Cheers
> >> >>>>>>>>>>>>> 
> >> >>>>>>>>>>>>> Martin
> >> >>>>>>>>>>>> 
> >> >>>>>>>>>>>> --
> >> >>>>>>>>>>>> Olivier Lamy
> >> >>>>>>>>>>>> http://twitter.com/olamy |
http://linkedin.com/in/olamy
> >> >>>>>>>>> 
> >> >>>>>>>>> --
> >> >>>>>>>>> Olivier Lamy
> >> >>>>>>>>> http://twitter.com/olamy | http://linkedin.com/in/olamy
> >> >>>>>>>> 
> >> >>>>>>>> --
> >> >>>>>>>> Olivier Lamy
> >> >>>>>>>> http://twitter.com/olamy | http://linkedin.com/in/olamy
> >> >>>>> 
> >> >>>>> --
> >> >>>>> Olivier Lamy
> >> >>>>> http://twitter.com/olamy | http://linkedin.com/in/olamy
> >> >> 
> >> >> --
> >> >> Olivier Lamy
> >> >> http://twitter.com/olamy | http://linkedin.com/in/olamy
> >> 
> >> --
> >> Diese Nachricht wurde von meinem Android-Gerät mit K-9 Mail gesendet.
> > 
> > --
> > Olivier Lamy
> > http://twitter.com/olamy | http://linkedin.com/in/olamy



Mime
View raw message