Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@lucene.apache.org
Date: Mon, 19 Nov 2012 14:30:58 +0000 (UTC)
From: "Shai Erera (JIRA)" <jira@apache.org>
To: dev@lucene.apache.org
Message-ID: <1314798260.2902.1353335458346.JavaMail.jiratomcat@arcas>
In-Reply-To: 
 <1290560961.45733.1316518628775.JavaMail.tomcat@hel.zones.apache.org>
Subject: [jira] [Commented] (LUCENE-3441) Add NRT support to
 LuceneTaxonomyReader
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/LUCENE-3441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13500263#comment-13500263 ] 

Shai Erera commented on LUCENE-3441:
------------------------------------

bq. why not have a single instance of the LRUCache for all time, and never call .clear() on it?

That will help as long as previous TR instances are indeed on their way to die. Otherwise, if e.g. an app, for some reason, reopens a TR but doesn't close the old one and uses both (again, for some really unknown reason), then two TR instances might affect each other.

Now, since that's a very stupid thing to do, I'm not sure that I care about this much, as long as we preserve correctness. Meaning, that that one instance may reduce the size of the cache, while another increases it - that's the app problem. That that the two instances might evict entries from the LRU cache left and center, that's the app problem.

The correctness issues that I'm worried about is (suppose that TR-1 and TR-2 share the same instance):
* TR-1 looks for a category "foo", doesn't find it and adds to the cache the fact that the category is unknown
* TR-2 looks for the category "foo", which exists in its newer version of the taxonomy, and receives the ordinal -1, which denotes that the category doesn't exist --- WRONG !!

To solve that, we could not store the fact that a category does not exist in the cache. Really, this shouldn't happen - apps do not ask the taxonomy for random categories. So then:

* TR-1 looks for a category "foo", doesn't find it in the cache and DOES NOT update the cache w/ that info. It goes to disk, doesn't find it there, returns -1.
* TR-2 looks for the category "foo", which exists in its newer version of the taxonomy, fetches it from disk and stores the ordinal in the cache.
* TR-1 looks for the category "foo" again, now receives an ordinal which is larger than its taxonomy size --- might be a problem !!

In general, since I don't think that apps access the taxonomy for random ordinals or categories, the second solution might be good. Never store in the cache the fact that an ordinal/category is not found + don't clear() the cache, only nullify its reference + hope for the best :)?
                
> Add NRT support to LuceneTaxonomyReader
> ---------------------------------------
>
>                 Key: LUCENE-3441
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3441
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: modules/facet
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>            Priority: Minor
>         Attachments: LUCENE-3441.patch
>
>
> Currently LuceneTaxonomyReader does not support NRT - i.e., on changes to LuceneTaxonomyWriter, you cannot have the reader updated, like IndexReader/Writer. In order to do that we need to do the following:
> # Add ctor to LuceneTaxonomyReader to allow you to instantiate it with LuceneTaxonomyWriter.
> # Add API to LuceneTaxonomyWriter to expose its internal IndexReader
> # Change LTR.refresh() to return an LTR, rather than void. This is actually not strictly related to that issue, but since we'll need to modify refresh() impl, I think it'll be good to change its API as well. Since all of facet API is @lucene.experimental, no backwards issues here (and the sooner we do it, the better).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org