lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <>
Subject Re: IFilter
Date Tue, 29 Apr 2003 18:14:57 GMT
On Tuesday, April 29, 2003, at 12:13  PM, <> 
>> I like your third option here.  Rather than fabricate another class
>> called Metadata, we could simply return a Map.
> +1
> We'll probably want an idiom where each IFilter declares its metadata
> keys/fields as constants, so there's no magic keys in the map.

I'm not following you here.  What do you mean by magic keys?

Who gets to name the fields that end up in the Document is where I'm 
not clear yet.  You would likely want some consistency in the field 
names among documents, or at least an overlap on "contents" or 
"keywords" or something like that.

>> I think the event-driven document reader use case should be
>> considered.
>> POI does this, I believe.  How would that impact our design?
> So, just to ensure I get you, if there's say a
> Reader getReader(InputStream is, String mimeType)
> method in the interface, you're wondering how to obtain the Reader
> without processing the entire inputstream first, or rather processing
> it from an event-driven mechanism?

I'm not thinking that detail oriented just yet.  If we are ok with the 
Map being returned for all fields by some specific document handler 
implementation, then the details of the event-driven option would be 
hidden under there, except that the 15MB file would then live in 
field(s) of the Map in memory and then transferred to a Document in 
memory.  Maybe its a non-issue, just was curious if the different ways 
a document could be processed should factor into the equation or not.

> No, that doesn't make sense. Wouldn't an event-driven document reader
> use case only apply to retrieval of metadata (Map), not file contents
> (Reader)? In which case you'd almost _have_ to finish processing the
> entire stream before returning the Map, no?
> Maybe I'm missing your point.

Nope, that is my point.  Dunno why I even brought it up since its not 
at all one of my use cases!  :)  Carry on.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message