lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Walter Underwood (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-4930) Lucene's use of WeakHashMap at index time prevents full use of cores on some multi-core machines, due to contention
Date Fri, 12 Apr 2013 15:38:16 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-4930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13630188#comment-13630188
] 

Walter Underwood commented on LUCENE-4930:
------------------------------------------

Java 6 update 14 was an especially buggy release. That even has the double parsing bug, which
allows people to put the JVM into an infinite loop with specially formed doubles. And it has
a very bad bug with the parallel GC which loses memory (and there is no workaround).

Update 14 is from May 2009, so that is missing almost four years of bug fixes. I really don't
think we should accept bug reports based on that JVM.

So at least update to the latest Java 6 immediately. Since Java 6 is no longer supported,
you should move to Java 7, too.

                
> Lucene's use of WeakHashMap at index time prevents full use of cores on some multi-core
machines, due to contention
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-4930
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4930
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/index
>    Affects Versions: 4.2
>         Environment: Dell blade system with 16 cores
>            Reporter: Karl Wright
>         Attachments: thread_dump.txt
>
>
> Our project is not optimally using full processing power during under indexing load on
Lucene 4.2.0.  The reason is the AttributeSource.addAttribute() method, which goes through
a WeakHashMap synchronizer, which is apparently single-threaded for a significant amount of
time.  Have a look at the following trace:
> "pool-1-thread-28" prio=10 tid=0x00007f47fc104800 nid=0x672b waiting for monitor entry
[0x00007f47d19ed000]
>    java.lang.Thread.State: BLOCKED (on object monitor)
>         at java.lang.ref.ReferenceQueue.poll(ReferenceQueue.java:98)
>         - waiting to lock <0x00000005c5cd9988> (a java.lang.ref.ReferenceQueue$Lock)
>         at org.apache.lucene.util.WeakIdentityMap.reap(WeakIdentityMap.java:189)
>         at org.apache.lucene.util.WeakIdentityMap.get(WeakIdentityMap.java:82)
>         at org.apache.lucene.util.AttributeSource$AttributeFactory$DefaultAttributeFactory.getClassForInterface(AttributeSource.java:74)
>         at org.apache.lucene.util.AttributeSource$AttributeFactory$DefaultAttributeFactory.createAttributeInstance(AttributeSource.java:65)
>         at org.apache.lucene.util.AttributeSource.addAttribute(AttributeSource.java:271)
>         at org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:107)
>         at org.apache.lucene.index.DocFieldProcessor.processDocument(DocFieldProcessor.java:254)
>         at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:256)
>         at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:376)
>         at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1473)
>         at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1148)
>         at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1129)
> …
> We’ve had to make significant changes to the way we were indexing in order to not hit
this issue as much, such as indexing using TokenStreams which we reuse, when it would have
been more convenient to index with just tokens.  (The reason is that Lucene internally creates
TokenStream objects when you pass a token array to IndexableField, and doesn’t reuse them,
and the addAttribute() causes massive contention as a result.)  However, as you can see from
the trace above, we’re still running into contention due to other addAttribute() method
calls that are buried deep inside Lucene.
> I can see two ways forward.  Either not use WeakHashMap or use it in a more efficient
way, or make darned sure no addAttribute() calls are done in the main code indexing execution
path.  (I think it would be easy to fix DocInverterPerField in that way, FWIW.  I just don’t
know what we’ll run into next.)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message