lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <>
Subject Re: IFilter
Date Thu, 01 May 2003 02:23:38 GMT
On Wednesday, April 30, 2003, at 06:22  PM, <> 
>> Tokenized?  Stored?  Should the underlying document handler make
>> these
>> determinations?
> I think so, yes.

But not field names?  :)

Its mostly a rhetorical question from me, as I'm not sure.  My <index> 
Ant task has the DocumentHandler create the Document instances, but the 
the Ant task itself adds some fields (file system last modified date 
and file path, to allow for dependency checking and rapid indexing) - 
so there is a bit of both going on.

>> These issues shed light on why this hasn't been done before and why
>> it
>> may not be something that can (or should) be done generically with a
>> simple interface.  There are a lot of domain-specific issues that
>> can
>> crop up.
> I feel a way around this, is by providing both a high- as well as
> low-level API. The high-level api involves passing the IFilter a
> Document, and it "does its thing". The low-level API provides more
> flexibility, with performance and convenience at a tradeoff (duh).

Can we agree not to prefix it with "I"?  We all have our pet peeves 
with code styles and naming conventions, and that is one of mine :)

This design seems fine with me.  No objections at all.

>> From client perspective,
> High-level:
> aContentHandler.populate(new Document());
> Low-level:
> Map m = aContentHandler.getMetadata();
> // iterate through map
> Reader r = aContentHandler.getReader();
> // add reader
> Do you think this would satisfy 90% of requirements?

I'm still not seeing the Reader thing - that is to read all the text 
contents of a file, for use in a single field?

The basic ideas here seem fine to me.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message