lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: Lucene 9.0 Java module system support
Date Sat, 11 Apr 2020 11:14:13 GMT
Hi,

 

I am currently on Easter vacation, but just wanted to give some feedback:

They wy you are doing it works at the moment, because you don’t have anything to do with
service providers. You don’t use custom codecs (outside lucene-core.jar) and you just ignored
the META-IF/services folder. Your example works, as Lucene finds everything that is needed
in the core.jar, so the “legacy” service loading works (lucene-core.jar find its own services
in lucene-core.jar). As soon as you would add lucene-backwards-codecs.jar, it would break
unfixable, unless you copy the whole backwards codecs into core. This is completely unfixable
in Lucene 8.x, as the new service-loader interface requires Java 9+.

 

But you may not yet have noticed that I already started to do preparatory work and migrated
in master (Lucene 9.0) the service provider to use the Java Runtime classes and we have thrown
away our own service loader (which cannot cross module boundaries). The fix was merged not
long ago:  <https://issues.apache.org/jira/browse/LUCENE-9281> https://issues.apache.org/jira/browse/LUCENE-9281
- this cannot be backported to Java 8 / Lucene 8, as the reason for the home-made SPIClassIterator
was exactly the missing features (and some classpath ordering issues in 3rd party JVMs like
IBM J9). Those issues are fixed in Java 11 (also allowing to instantiate the classes on your
own). When you are testing your “hack patch” with Lucene’s master branch you may succeed
with also loading classes from the backwards codecs – no guarantees yet!)

 

About StandardAnalyzer: Unfortunately I aggressively complained a while back when Mike McCandless
wanted to move standard analyzer out of the analysis package into core (“for convenience”).
This was a bad step, and IMHO we should revert that or completely rename the packages and
everything. The problem here is: As the analysis services are only part of lucene-analyzers,
we had to leave the factory classes there, but move the implementation classes in core. The
package has to be the same. The only way around that is to move the analysis factory framework
also to core (I would not be against that). This would include all factory base classes and
the service loading stuff. Then we can move standard analyzer and some of the filters/tokenizers
including their factories to core an that problem would be solved.

 

My plan to move to modules is the following:

 

(1) Do preparatory work:

*	Retire SPIClassIterator (DONE).
*	Add some preparatory issues to cleanup class hierarchy: Move Analysis SPI to core / remove
StandardAnalyzer and related classes out of core back to anaysis
*	Fix Codec API to no rely on package-private classes, so we can have a completely public
API with abstract classes for codecs, so stuff in backwards-codecs does not need to have access
to package private stuff in core.
*	Cleanup sandbox to prefix all classes there with “sandbox” package and where needed
remove package-private access. If it’s needed for internal access, WTF: Just move the stuff
to core! We have a new version 9.0, so either retire/delete Sandbox stuff or make it part
of Lucene core.

(2) Wait for the Gradle Build to be finalized, because including Module stuff into the current
Ant build won’t work

*	Ant build must be retired!

(3) Make Lucene real modules

*	As we are on Java 11 already, add module-info.java everywhere
*	Fix gradle build to create and test modules (Latest Gradle needed)
*	Migrate all META-INF/services/* to module-info.java (before doing this, figure out of the
META-INF files must stays for non-module usage, or if Java is clever enough to also look into
module descriptor to find services). We may need all services at both locations (for module
or classpath usage; we need a build helper to check that it’s in-line)

 

I don’t want Lucene work with automodules in Java 11, it should be fully modularized.

 

If you want to help and express interest: Please open an issue for the preparatory work listing
the cases of same-package in different jar files. I would split this up as described above:
Analysis issues with standardanalyzer, codecs pkg-private apis, sandbox.

 

Uwe

 

-----

Uwe Schindler

Achterdiek 19, D-28357 Bremen

https://www.thetaphi.de

eMail: uwe@thetaphi.de

 

From: David Ryan <david@livemedia.com.au> 
Sent: Saturday, April 11, 2020 9:30 AM
To: dev@lucene.apache.org
Subject: Re: Lucene 9.0 Java module system support

 

 

Thanks, I had probably missed some of what Uwe was saying in regards to the limitation of
what would be possible even if the suggested changes were made. Given your points and the
fact that it would take a while to have any of the changes filter into a release version,
I decided to develop a patch-lucene.sh script to validate if the changes would allow me to
access the required parts of Lucene.  While it may not provide the full functionality provided
by SPI including analysis chains and codecs, I worked out it will allow the use of the basics.

 

The patch script below adds Automatic module names to the four Lucene libraries I needed (not
strictly required, but I was interested to check if it would work).  For now, any duplicate
packages I identified have been moved into the core library (as identified in suggested changes).
Given, I'm not using the SPI functionality, I simply deleted the standard analysers from the
services list (required or the module system complains they are missing).  Thankfully Gradle
doesn't validate jars in the cache so the patch script only needs to be run once.

 

Once the patch script was applied, I was able to use Ngram analysers and the query parser
without issue. This will provide the essentials of what we needed for an embedded solution
(location indexing, short text indexing with ngrams tokenizer and lower case filter, searching
by location with nearest, bounding box and radius search, and partial text searches using
the query parser). Of course, I've only scratched the surface of the APIs and right now it
isn't clear if/when the SPI functionality might cause further issues or be required.  

 

So, I've validated that the suggested changes would allow basic functionality to work under
the Java module system. If there's anything else I can do to help progress the suggested changes
please let me know.

 

Regards,

David.

 

 

--------- patch-lucene.sh -- works on osx ------

mkdir -p patch-lucene/core
mkdir -p patch-lucene/analyzers
mkdir -p patch-lucene/sandbox
mkdir -p patch-lucene/queryparser
cd patch-lucene

# copy the jars from the gradle cache.
cp ~/.gradle/caches/modules-2/files-2.1/org.apache.lucene/lucene-core/8.5.0/3f9ea85fff4fc3f7c83869dddb9b0ef7818c0cae/lucene-core-8.5.0.jar
.
cp ~/.gradle/caches/modules-2/files-2.1/org.apache.lucene/lucene-analyzers-common/8.5.0/7156f2e545fd6e7faaee4781d15eb60cf5f07646/lucene-analyzers-common-8.5.0.jar
.
cp ~/.gradle/caches/modules-2/files-2.1/org.apache.lucene/lucene-sandbox/8.5.0/2b275921f2fd92b15b4f1a2a565467c3fa221ef9/lucene-sandbox-8.5.0.jar
.
cp ~/.gradle/caches/modules-2/files-2.1/org.apache.lucene/lucene-queryparser/8.5.0/13c38f39b1a7d10c4749ba789fa95da5868d4885/lucene-queryparser-8.5.0.jar
.

# Add automatic module name to core.
cd core
jar -xf ../lucene-core-8.5.0.jar
sed -i '' -e '/Created-By:/a\'$'\n''Automatic-Module-Name: org.apache.lucene.core'$'\n' META-INF/MANIFEST.MF

# Add automiatc module name, move standard analysis classes into core. Remove any classes
in standard from service lists. 
cd ../analyzers
jar -xf ../lucene-analyzers-common-8.5.0.jar
sed -i '' -e '/Created-By:/a\'$'\n''Automatic-Module-Name: org.apache.lucene.analyzers.common'$'\n'
META-INF/MANIFEST.MF
sed -i '' -e '/standard/d' META-INF/services/*
mv org/apache/lucene/analysis/standard/* ../core/org/apache/lucene/analysis/standard
rmdir org/apache/lucene/analysis/standard
jar -cfm ../lucene-analyzers-common-8.5.0-fix.jar META-INF/MANIFEST.MF .

# Add automatic module name, move search and document packages into core. move Java 9 specific
search into core. 
cd ../sandbox
jar -xf ../lucene-sandbox-8.5.0.jar
sed -i '' -e '/Created-By:/a\'$'\n''Automatic-Module-Name: org.apache.lucene.sandbox'$'\n'
META-INF/MANIFEST.MF
mv org/apache/lucene/search/* ../core/org/apache/lucene/search
rmdir org/apache/lucene/search
mv org/apache/lucene/document/* ../core/org/apache/lucene/document
rmdir org/apache/lucene/document
mv META-INF/versions/9/org/apache/lucene/search/* ../core/META-INF/versions/9/org/apache/lucene/search
rm -rf META-INF/versions
jar -cfm ../lucene-sandbox-8.5.0-fix.jar META-INF/MANIFEST.MF .

# Package up core with changes.
cd ../core
jar -cfm ../lucene-core-8.5.0-fix.jar META-INF/MANIFEST.MF .

# Add automatic module name.
cd ../queryparser
jar -xf ../lucene-queryparser-8.5.0.jar
sed -i '' -e '/Created-By:/a\'$'\n''Automatic-Module-Name: org.apache.lucene.queryparser'$'\n'
META-INF/MANIFEST.MF
jar -cfm ../lucene-queryparser-8.5.0-fix.jar META-INF/MANIFEST.MF .

# Copy the fixed versions back into gradle cache.
cd ..
cp lucene-core-8.5.0-fix.jar ~/.gradle/caches/modules-2/files-2.1/org.apache.lucene/lucene-core/8.5.0/3f9ea85fff4fc3f7c83869dddb9b0ef7818c0cae/lucene-core-8.5.0.jar
cp lucene-analyzers-common-8.5.0-fix.jar ~/.gradle/caches/modules-2/files-2.1/org.apache.lucene/lucene-analyzers-common/8.5.0/7156f2e545fd6e7faaee4781d15eb60cf5f07646/lucene-analyzers-common-8.5.0.jar
cp lucene-sandbox-8.5.0-fix.jar ~/.gradle/caches/modules-2/files-2.1/org.apache.lucene/lucene-sandbox/8.5.0/2b275921f2fd92b15b4f1a2a565467c3fa221ef9/lucene-sandbox-8.5.0.jar
cp lucene-queryparser-8.5.0-fix.jar ~/.gradle/caches/modules-2/files-2.1/org.apache.lucene/lucene-queryparser/8.5.0/13c38f39b1a7d10c4749ba789fa95da5868d4885/lucene-queryparser-8.5.0.jar

------------

 

On Sat, Apr 11, 2020 at 11:21 AM Chris Hostetter <hossman_lucene@fucit.org <mailto:hossman_lucene@fucit.org>
> wrote:


: If the changes I proposed are still viewed as having too many downstream
: impacts, my fallback position will be to patch the jars. This involves
: using the gradle import system to get the jars from Maven, then using a
: patch script to manually unzip the jars, move the offending classes into
: other jars which share the same package name and rezip. So far, I've been

I'm no expert here, but i trust that Uwe is, and i feel like your followup 
questions/suggests have still avoided his primary point about *why* 
Lucene/Solr hasn't attempted jigsaw modulariation...

: >>> There is currently some preparatory things to move forward with modules,
: >>> so although you might be able to actually compile Lucene with module system
: >>> (by limiting to a subset of JAR files), it currently won’t work
: >>> cross-module due to the way how it handles ServiceLoader interfaces
: >>> (codecs, postings formats, analyzers, see
: >>> https://issues.apache.org/jira/browse/LUCENE-9281). The only way to
: >>> make it work at runtime is to put all of Lucene into one module.

...so, IIUC, even if you patch the *current* jars, any Lucene code you use 
that depends on SPI (like analysis chains, codecs, etc...) isn't going to 
work unless follow Uwe's primary suggestion for folks who care about 
modules...

: >>> Th general recommendation is to combine all required Lucene libraries
: >>> into a separate JAR file during the maven / gradle build (e.g. using the
: >>> Maven Shade plugin). Keep in mind that Lucene is also not suitable for use


-Hoss
http://www.lucidworks.com/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org <mailto:dev-unsubscribe@lucene.apache.org>

For additional commands, e-mail: dev-help@lucene.apache.org <mailto:dev-help@lucene.apache.org>



Mime
View raw message