lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gil Tene (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (LUCENE-4482) Likely Zing JVM bug causes failures in TestPayloadNearQuery
Date Sat, 13 Oct 2012 16:10:04 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-4482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13475643#comment-13475643
] 

Gil Tene edited comment on LUCENE-4482 at 10/13/12 4:09 PM:
------------------------------------------------------------

We're looking into this bug report. Will hopefully report back / resolve it soon. [But Michael,
please go ahead and report it on our bugzilla as well per the above].

[Uwe Schindler wrote:]
> I would run Zing tests, too, but before doing that they should:
> Not rely on strange binary kernel modules that are outdated on
> Ubuntu 12.04.1 LTS. The Jenkins server is running in DMZ so I
> will never ever run it with outdated kernels. They should (if
> they really need a kernel module, which is in my opinion a no-go,
> too) use DKMS and make the kernel module open source, so my kernel
> is also not tainted. Without that I will not support Zing, sorry.
> But I doubt if the kernel module is really needed! Without a
> clear explanation why this is needed on their homepage I don't agree.

This has two parts: One asking/questioning why our loadable module is needed at all, and the
other relating to it's availability for various kernels and Linux distros.

1. Why is the ZST (which includes a loadable module) needed for Zing to operate?

One of Zing JVM's main distinctions is that it's C4 garbage collector (aka GPGC internally)
eliminates garbage collection as a response time concern for enterprise applications. Among
other things, C4 relies on rapid manipulation of virtual memory and physical memory mappings
to maintain continuous operation. While the semantics of the manipulations we do are possible
using the vanilla mmap/mremap/munmap/madvise APIs, the rate at which those are supported in
Linux (and most other OSs) is extremely low due mostly to the historic, extremely conservative
approach to in-process TLB invalidation, and due partly to issues with multiple-page size
manipulations. We're not talking small change here. More like 4-6 orders of magnitude for
our common operation, which is, right now, the difference between a practical and impractical
implementation of C4.
You can find a detailed discussion of the difference in metrics for these operations at http://tinyurl.com/34ytcvc,
and a detailed discussion of C4 in our ISMM paper (http://tinyurl.com/94c9btb at the ACM site,
or at the Azul site http://tinyurl.com/7rydpvo).
   
2. Loadable Module availability and compatibility

To be clear our loadable module is open source, under GPLv2, and you can have the sources
for it if you wish. The reason for the current choice of packaging is that a wide range of
current end-customer's Linux systems do not have (or wish to install) the tooling needed to
build or re-build the module, and what they need operationally is an RPM that opens and installs
without requiring kernel headers and the like. In addition, we tend to  intensively test and
examine the kernel module against specific distros and kernel to verify compatibility and
stability, and declare official support for these well tested combinations.

On other linux distros (RHEL, CentOS, SLES), the kernel revision velocity is fairly slow,
and the kernel api signatures tend to remain the same unless semantics are actually modified.
As a result, we use a single module RPM of RHEL5 and CentOS 5 versions, and have only needed
a single rev of the module packaging during the evolution of RHEL6/CentOS6 and SLES 11 thus
far.  

As we added Zing support for Ubunutu, primarily due to it's popularity with developers, we
found that kernel api signatures there change with practically every patch, even with no semantic
change. This creates some serious friction with our current loadable module packaging and
distribution choice for Ubuntu. We are working to resolve this, either by using DKMS or some
other alternative, such that modules can continue to work or be properly updated as kernels
rev up in Ubunutu-style distros.

So we're working on it, and it will get better...

-- Gil. [CTO, Azul Systems]

                
      was (Author: giltene):
    We're looking into this bug report. Will hopefully report back / resolve it soon. [But
Michael, please go ahead and report it on our bugzilla as well per the above].

[Uwe Schindler wrote:]
> I would run Zing tests, too, but before doing that they should:
> Not rely on strange binary kernel modules that are outdated on
> Ubuntu 12.04.1 LTS. The Jenkins server is running in DMZ so I
> will never ever run it with outdated kernels. They should (if
> they really need a kernel module, which is in my opinion a no-go,
> too) use DKMS and make the kernel module open source, so my kernel
> is also not tainted. Without that I will not support Zing, sorry.
> But I doubt if the kernel module is really needed! Without a
> clear explanation why this is needed on their homepage I don't agree.

This has two parts: One asking/questioning why our loadable module is needed at all, and the
other relating to it's availability for various kernels and Linux distros.

1. Why is the ZST (which includes a loadable module) needed for Zing to operate?

One of Zing JVM's main distinctions is that it's C4 garbage collector (aka GPGC internally)
eliminates garbage collection as a response time concern for enterprise applications. Among
other things, C4 relies on rapid manipulation of virtual memory and physical memory mappings
to maintain continuous operation. While the semantics of the manipulations we do are possible
using the vanilla mmap/mremap/munmap/madvise APIs, the rate at which those are supported in
Linux (and most other OSs) is extremely low due mostly to the historic, extremely conservative
approach to in-process TLB invalidation, and due partly to issues with multiple-page size
manipulations. We're not talking small change here. More like 4-6 orders of magnitude for
our common operation, which is, right now, the difference between a practical and impractical
implementation of C4.
You can find a detailed discussion of the difference in metrics for these operations at http://tinyurl.com/34ytcvc,
and a detailed discussion of C4 in our ISMM paper (http://tinyurl.com/94c9btb at the ACM site,
or at the Azul site http://tinyurl.com/7rydpvo).
   
2. Loadable Module availability and compatibility

To be clear our loadable module is open source, under GPLv2, and you can have the sources
for it if you wish. The reason for the current choice of packaging is that a wide range of
current end-customer's Linux systems do not have (or wish to install) the tooling needed to
build or re-build the module, and what they need operationally is an RPM that opens and installs
without requiring kernel headers and the like. In addition, we tend to  intensively test and
examine the kernel module against specific distros and kernel to verify compatibility and
stability, and declare official support for these well tested combinations.

On other linux distros (RHEL, CentOS, SLES), the kernel revision velocity is fairly slow,
and the kernel api signatures tend to remain the same unless semantics are actually modified.
As a result, we use a single module RPM of RHEL5 and CentOS 5 versions, and have only needed
a single rev of the module packaging during the evolution of RHEL6/CentOS6 and SLES 11 thus
far.  

As we added Zing support for Ubunutu, primarily due to it's popularity with developers, we
found that kernel api signatures there change with practically every patch, even with no semantic
change. This creates some serious friction with our current loadable module packaging and
distribution choice for Ubuntu. We are working to resolve this, either by using DKMS or some
other alternative, such that modules can continue to work or be properly updated as kernels
rev up in Ubunutu-style distros.

So we're working on it, and it will get better...

                  
> Likely Zing JVM bug causes failures in TestPayloadNearQuery
> -----------------------------------------------------------
>
>                 Key: LUCENE-4482
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4482
>             Project: Lucene - Core
>          Issue Type: Bug
>         Environment: Lucene trunk, rev 1397735
> Zing:
> {noformat}
>   java version "1.6.0_31"
>   Java(TM) SE Runtime Environment (build 1.6.0_31-6)
>   Java HotSpot(TM) 64-Bit Tiered VM (build 1.6.0_31-ZVM_5.2.3.0-b6-product-azlinuxM-X86_64,
mixed mode)
> {noformat}
> Ubuntu 12.04 LTS 3.2.0-23-generic kernel
>            Reporter: Michael McCandless
>         Attachments: LUCENE-4482.patch
>
>
> I dug into one of the Lucene test failures when running with Zing JVM
> (available free for open source devs...).  At least one other test
> sometimes fails but I haven't dug into that yet.
> I managed to get the failure easily reproduced: with the attached
> patch, on rev 1397735 checkout, if you cd to lucene/core and run:
> {noformat}
>   ant test -Dtests.jvms=1 -Dtests.seed=C3802435F5FB39D0 -Dtests.showSuccess=true
> {noformat}
> Then you'll hit several failures in TestPayloadNearQuery, eg:
> {noformat}
> Suite: org.apache.lucene.search.payloads.TestPayloadNearQuery
>   1> FAILED
>   2> NOTE: reproduce with: ant test  -Dtestcase=TestPayloadNearQuery -Dtests.method=test
-Dtests.seed=C3802435F5FB39D0 -Dtests.slow=true -Dtests.locale=ga -Dtests.timezone=America/Adak
-Dtests.file.encoding=US-ASCII
> ERROR   0.01s | TestPayloadNearQuery.test <<<
>    > Throwable #1: java.lang.RuntimeException: overridden idfExplain method in TestPayloadNearQuery.BoostingSimilarity
was not called
>    > 	at __randomizedtesting.SeedInfo.seed([C3802435F5FB39D0:4BD41BEF5B075428]:0)
>    > 	at org.apache.lucene.search.similarities.TFIDFSimilarity.computeWeight(TFIDFSimilarity.java:740)
>    > 	at org.apache.lucene.search.spans.SpanWeight.<init>(SpanWeight.java:62)
>    > 	at org.apache.lucene.search.payloads.PayloadNearQuery$PayloadNearSpanWeight.<init>(PayloadNearQuery.java:147)
>    > 	at org.apache.lucene.search.payloads.PayloadNearQuery.createWeight(PayloadNearQuery.java:75)
>    > 	at org.apache.lucene.search.IndexSearcher.createNormalizedWeight(IndexSearcher.java:648)
>    > 	at org.apache.lucene.search.AssertingIndexSearcher.createNormalizedWeight(AssertingIndexSearcher.java:60)
>    > 	at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:265)
>    > 	at org.apache.lucene.search.payloads.TestPayloadNearQuery.test(TestPayloadNearQuery.java:146)
>    > 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>    > 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>    > 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>    > 	at java.lang.reflect.Method.invoke(Method.java:597)
>    > 	at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559)
>    > 	at com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79)
>    > 	at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737)
>    > 	at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773)
>    > 	at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787)
>    > 	at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
>    > 	at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
>    > 	at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
>    > 	at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
>    > 	at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
>    > 	at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
>    > 	at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
>    > 	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>    > 	at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
>    > 	at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782)
>    > 	at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442)
>    > 	at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746)
>    > 	at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648)
>    > 	at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682)
>    > 	at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693)
>    > 	at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
>    > 	at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
>    > 	at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
>    > 	at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
>    > 	at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
>    > 	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>    > 	at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
>    > 	at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
>    > 	at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
>    > 	at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
>    > 	at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>    > 	at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
>    > 	at java.lang.Thread.run(Thread.java:661)
> {noformat}
> The patch at least isolates the JVM bug even if it's not exactly a
> minimal test :)  Somehow the idfExplain method, which
> is overridden in this test's BoostingSimilarity, fails to be called
> (the super.idfExplain is called instead), which leads to the test
> failures.
> The failure does not happen if you run this test in isolation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message