lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-5246) SegmentInfoPerCommit continues to list unneeded updatesFiles
Date Sun, 29 Sep 2013 13:41:24 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-5246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13781374#comment-13781374
] 

Michael McCandless commented on LUCENE-5246:
--------------------------------------------

Wow, this is a good catch; I'm glad the test uncovered this!

This was a "file" leak right?  In that files that are never actually
opened will remain in the index, so that over time if you keep
applying dv updates and reopening, you'd get a forever increasing
number of files in the index.

Can we only partially neuter the test (it's obviously good!)?
E.g. maybe we can change numThreadUpdates to be an atLeast so that
multiplier makes the test run for more iterations?  Also, I prefer
that each thread does its own committing; it's more evil.  Using a
separate thread that sleeps for 50 msec each time it's not clear how
many commits will actually happen.

Can we fix SIPC.getUpdatesFiles to return an unmodifiableMap?  And then
fix RALD to not add into the map like it does now?

Also don't forget to email people to re-index trunk indices after
committing this...


> SegmentInfoPerCommit continues to list unneeded updatesFiles
> ------------------------------------------------------------
>
>                 Key: LUCENE-5246
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5246
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/index
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>         Attachments: LUCENE-5246.patch
>
>
> SegmentInfoPerCommit continues to list updates files even if they are unneeded anymore.
For example, if you update the values of documents of field 'f', it creates a gen'd .fnm (FieldInfos)
file. If you commit/reopen and update the field again (maybe now a different set of documents),
it creates another gen'd .fnm, but continues to list both gens, even though only the latest
one is needed.
> To solve this, SIPC would need to know then dvGen of each FieldInfo, so that it can correctly
list only the updates files that are truly needed. I'll work on a testcase and fix.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message