lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <kelvin-li...@relevanz.com>
Subject Re: IFilter
Date Thu, 01 May 2003 03:43:36 GMT
On Wed, 30 Apr 2003 22:23:38 -0400, Erik Hatcher wrote:
>On Wednesday, April 30, 2003, at 06:22  PM, <kelvin-
>lists@relevanz.com> 
>wrote:
>>>Tokenized?  Stored?  Should the underlying document handler make
>>>these
>>>determinations?
>>
>>I think so, yes.
>
>But not field names?  :)
>
>Its mostly a rhetorical question from me, as I'm not sure.  My
><index> 
>Ant task has the DocumentHandler create the Document instances, but
>the
>the Ant task itself adds some fields (file system last modified date
>and file path, to allow for dependency checking and rapid indexing) 
-
>
>so there is a bit of both going on.

Ok. Probably time to get the terminolgy straight. :-) 

Instead of IFilter, I propose ContentHandler (I'm not 100% happy with 
it, but that's what I'm using now). I'm fine with the use of 
DocumentHandler (since Indyo uses it anyway). So DocumentHandler 
creates Documents and other stuff, and ContentHandler works with file 
contents. OK?

Basically, if one doesn't have a requirement for specific names of 
fields, and ok with leaving it to the respective ContentHandlers, 
then it should be alright to do use the populate(Document) method in 
the ContentHandler. In other words, if the HTMLContentHandler calls 
its title, "HTMLTitle" for instance, and you don't really care, then 
all is alright. If you're peeved about it, go ahead and retrieve the 
metadata and do your mapping and add to Document via low-level.

>>
>>I feel a way around this, is by providing both a high- as well as
>>low-level API. The high-level api involves passing the IFilter a
>>Document, and it "does its thing". The low-level API provides more
>>flexibility, with performance and convenience at a tradeoff (duh).
>
>Can we agree not to prefix it with "I"?  We all have our pet peeves
>with code styles and naming conventions, and that is one of mine :)
>
>This design seems fine with me.  No objections at all.

+1

>
>
>>>>From client perspective,
>>High-level:
>>aContentHandler.populate(new Document());
>>
>>Low-level:
>>Map m = aContentHandler.getMetadata();
>>// iterate through map
>>Reader r = aContentHandler.getReader();
>>// add reader
>>
>>Do you think this would satisfy 90% of requirements?
>
>I'm still not seeing the Reader thing - that is to read all the text
>contents of a file, for use in a single field?
>

Conceptually, I'd like to differentiate contents of the file from its 
metadata. I know it may be a little strange sometimes, especially if 
some of the metadata comes from the contents, but I think its 
advantageous to think in this way.

Practically, I'm _really_ uncomfortable with placing a Reader in the 
metadata Map.

Kelvin


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message