lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mark Miller (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation
Date Sat, 18 Apr 2009 00:26:15 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12700391#action_12700391
] 

Mark Miller commented on LUCENE-831:
------------------------------------

I've got a bit of the same feeling. My list was more or less cherry picked from all of the
above comments, and my initial feeling was their was not enough motivation as well. But the
more I thought about it, the more kind of ugly field cache is. And we would want to lose exposing
Parser so that CFS can be a seamless backing. That makes FieldCache even uglier for a while.
Clickless thus far here too, but I think we have a good base to work with still.

{quote}Honestly these reasons are not net/net compelling enough to warrant a
whole new API? They are fairly minor. And I agree: LUCENE-1483 has
already achieved the biggest step forward here.{quote}

Not only that, but almost all of those reasons can be handled by allowing a custom FieldCache
to be used, rather than just hard coding to the default singleton.

A couple responses:

{quote}We need source pluggability for when CSF arrives (but, admittedly,
we could wait until CSF actually does arrive){quote}
We have it? Just pass the CSFValueSource at IndexReader creation?

{quote}
Allowing values to change, just like we can call
IndexReader.setNorm/deleteDoc to change norms/deletes. We'd need a
copy-on-write approach, like norms & deleted docs.{quote}
Good point. We need a way to update, that can throw USO Exception?

{quote}
How would norms be folded into this? Ideally, each field could
choose to pull its norms from any source. Document level norms
was discussed somewhere, and should easily "fit" as another norms
source. We'd need to relax how per-doc-field boosting is computed
at runtime to pull from such "arbitrary" sources.{quote}
Good point again. Getting norms under this API will add a bit more meat to this issue.

{quote}
Deleted docs could also be represented as a ValueSource? Just one
bit per doc. This way one could swap in whatever source for
"deleted docs" one wanted.{quote}
You've got me here at the moment. I don't know the delete code very well, but I will in time
:)

{quote}
      Allowing for docs that have more than one value. (We'd also need
      to extend sorting to be able to compare multiple values).
{quote}
This is an interesting one, because I wonder if we can do it and stick with arrays? A multi
dimensional array seems a bit much...

{quote}
An mmap implementation (like Lucy/KS) - should feel just like CSF
or uninversion (ie, "just another impl").{quote}
This is already fairly independent I think...

{quote}
Good impls for the enum case (all strings could be considered
enums), eg if there are only 100 unique strings in that field, you
only need 7 bits per ord derefing into the char[] values.
{quote}
+1. Yes.

{quote}
Possible future when Lucene computes sort cache (for text fields)
and stores in the index{quote}
I'm not familiar with that idea, so not sure what affect this has...

{quote}
Allowing field sort to use an entirely external source of values
{quote}
I think both options allow that now - if you pass the ValueSource from the reader, it can
get its values from everywhere. If you override the reader valuesource with the sortfield
valuesource, it too can load from anywhere. I am just not sure both options are really needed.
I am kind of liking Uwe's idea of assigning ValueSources per field, though that could probably
get messy. Perhaps a default, and then per field overrides? 

> Complete overhaul of FieldCache API/Implementation
> --------------------------------------------------
>
>                 Key: LUCENE-831
>                 URL: https://issues.apache.org/jira/browse/LUCENE-831
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Hoss Man
>            Assignee: Mark Miller
>             Fix For: 3.0
>
>         Attachments: ExtendedDocument.java, fieldcache-overhaul.032208.diff, fieldcache-overhaul.diff,
fieldcache-overhaul.diff, LUCENE-831-trieimpl.patch, LUCENE-831.03.28.2008.diff, LUCENE-831.03.30.2008.diff,
LUCENE-831.03.31.2008.diff, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch,
LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch,
LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch, LUCENE-831.patch
>
>
> Motivation:
> 1) Complete overhaul the API/implementation of "FieldCache" type things...
>     a) eliminate global static map keyed on IndexReader (thus
>         eliminating synch block between completley independent IndexReaders)
>     b) allow more customization of cache management (ie: use 
>         expiration/replacement strategies, disk backed caches, etc)
>     c) allow people to define custom cache data logic (ie: custom
>         parsers, complex datatypes, etc... anything tied to a reader)
>     d) allow people to inspect what's in a cache (list of CacheKeys) for
>         an IndexReader so a new IndexReader can be likewise warmed. 
>     e) Lend support for smarter cache management if/when
>         IndexReader.reopen is added (merging of cached data from subReaders).
> 2) Provide backwards compatibility to support existing FieldCache API with
>     the new implementation, so there is no redundent caching as client code
>     migrades to new API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message