lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <>
Subject Re: Fieldable, AbstractField, Field
Date Thu, 20 Mar 2008 00:57:56 GMT

: Wouldn't subclassing ReadOnlyDocument also work in this case, if you override
: the getField* to do your own new logic if it applies else fallback to super?

Sure, but how will IndexReader (or really FieldsReader) know which 
subclass to instantiate?  I think in LUCENE-778 the notion of passing a 
DocumentFactory to IndexReader was brought up; in another reply to this 
thread robert suggests having a setter that takes in a Class which has a 
noarg constructor and using reflect -- both of these could work, but 
frankly having a really simple API that allows for object decoration seems 
cleaner to me.

: approach.  People who are careful (store enough fields, don't use boosting or
: have separate store for their boosting) could pull Documents from a reader,
: tweak them, and build a new index.

this to me is the strongest reason to have seperate classes, where neither 
is the base class for the other -- we want to discourage the casual 
observer from assuming this is simple.  we want to make sure that users 
who want to do this have to go out of their way to have to write the 
translation from SearchResultDocument to IndexableDocument so it's clear 
to them what they might be losing in the translation.  (incidently: this 
is another topic that came up in LUCENE-778...


fundementally, i believe
  * the API for objects that are passed to "indexing" should only require 
"getters" that are essential for understanding how to indexing 
that document. ie: if indexing happens by calling 
IndexWriter.addDocument(IndexableDoc) then the IndexableDoc API should 
only require that there is a way to "get" IndexableFields and other 
options, IndexableField should only describe how to "get" the field name 
and field value and; the client providing the IndexableDoc should be free 
to provide an object that implements those methods any way it sees fit -- 
we will most likely want to provde "simple" concrete subclasses of these 
APIs that have simple setters to meet the common cases -- but the 
IndexWriter methods should know only about the "API" classes.
  * the API for objects that are returned from "reading/searching" should 
only specify "getters" that are essential for the client to understand 
the data the index knows about those documents.  "setters" used by 
IndexReader (or it's internals) shouldn't be exposed by this API unless we 
*truely* want to let clients treat those objects as mutabl "Beans" for 
their own purposes ... personally, i think it's safer to make it easy 
to decorate those objects then to allow arbitrary modifications.

: Actually I think they do share alot more than just name of the field?
: Accessing the "stored" value of a document is exactly what indexing needs to
: do when it indexes the document in the first place?  Ie, a "stored" document
: "looks alot like" the document at indexing time that had been stored.  And

i think the stored value looks alot like the value stored during index 
because of the perspective we look at them from ... but a Document 
returned from a search just has Fields with "Values" while a document that 
needs to to be indexed will have Fields with "Values To Store" and  
Fields with "Values To Index" and Fields with both ... there's really no 
reason for a single method in a common base class to describe those 

: things like isTokenized, isTermVectorStored, isStoreOffsetWithTermVector,
: isBinary are actually preserved in the index and known to the reader, so it's
: worth having these methods available at search time?

I admit, some of those is*() methods might make sense in a common base 
class -- but not all of them are specific to each instance of Field. Some 
are the least common denominator of all Fields indexed with the same name 
added -- it's convinient to return that info as part of each Field 
instance so thatthe client doesn't have to go call some method in 
IndexReader (ie: reader.isTermVectorStored(fieldName)) but providing it to 
the client using the same method (not just a method with a similar, or 
identical name, but the *exact* same method as defined by a common 
baseclass) is very confusing when that method returns something different 
then what is was when the client orriginally indexed that document.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message