lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <hossman_luc...@fucit.org>
Subject Re: Fieldable, AbstractField, Field
Date Wed, 19 Mar 2008 18:15:59 GMT

: I do like moving towards a separation of Document for indexing vs
: searching for 3.0.
: 
: Disregarding for starters how we get there from here...
: 
: Wouldn't we just want a base class (not an interface), say
: ReadOnlyField, that is used in documents retrieved by a reader?  This
: class would also have Index.*, Store.*, TermVector.*, and
: isStored/Indexed/Tokenized/Compressed, etc, as these are recoverable
: from an index.  Couldn't this be a concrete class, ie, the actual
: class instantiated when a Document is loaded from a reader?

Yes, but one of the peeves I've heard lots of people express over 
the years is that they want want to "decorate" the Documents returned by a 
search, so that they can make those documents access alternate field 
stores and metadata not in the index.  (LUCENE-778 started out being a 
dicussion of wanting to pass custom subclasses of Document to 
writer.addDocument(), but it also mentions wanting to get custom documents 
back from IndexReader.

Imagine you're writing an app that does a search with Lucene, and then 
returns a List<Document> ... 

  public List<Document> myMethod(options) {
    Document<List> docs = doSomeSearchStuff(indexreader, query, options)
    return docs;
  }

you've got alot of downstream code that calls myMethod and uses/propogates 
this List<Document> ... and then one day you decide that for each document 
you want to also include some metadata that Lucene doesn't know anything 
about, your downstream client code is happy to treat this new metadata 
just like any other field.  You could change the API of myMethod and jump 
through a lot of hoops changing all of your other code; or if "Document" 
is a simple interface, you could do something like...

  public class MyDocumentWraper implements Document {
    public MyDocumentWraper(Document, otherData) {...}
    public static List<Document> wrappList(List<Document>, otherData) {...}
  }
  public List<Document> myMethod(options) {
    Document<List> docs = doSomeSearchStuff(indexreader, query, options)
    return MyDocumentWraper.wrapList(docs, getOtherData(options));
  }

(If i remember right, there are some comments to this effect in LUCENE-778 
as well)

: And then a subclass, IndexableField, that adds reader & tokenStream
: values, get/set boost, setters to change a field's value, etc.

IndexableField really shouldn't be a subclass of whatever class is 
returned after a sarch is done ... the methods used for accessing the 
"stored" value of a returned document make as little sense in the 
context of IndexableField as the setBoost/Reader/TokenStream functions of 
Document currently make when a search is executed.

when all is said and done: an IndexableField and a SearchResultField 
shouldn't have anything in common except *maybe* that they both have a 
fieldName.


I think Yonik once argued that the ideal API for geting a Document out of 
an IndexReader would be...

   /** @return map of field name to field values */
   public Map<String,String[]> getDocument(int id)




-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message