Return-Path: Delivered-To: apmail-jakarta-lucene-dev-archive@apache.org Received: (qmail 38191 invoked from network); 1 May 2003 03:28:54 -0000 Received: from exchange.sun.com (192.18.33.10) by daedalus.apache.org with SMTP; 1 May 2003 03:28:54 -0000 Received: (qmail 18698 invoked by uid 97); 1 May 2003 03:31:03 -0000 Delivered-To: qmlist-jakarta-archive-lucene-dev@nagoya.betaversion.org Received: (qmail 18691 invoked from network); 1 May 2003 03:31:02 -0000 Received: from daedalus.apache.org (HELO apache.org) (208.185.179.12) by nagoya.betaversion.org with SMTP; 1 May 2003 03:31:02 -0000 Received: (qmail 37927 invoked by uid 500); 1 May 2003 03:28:52 -0000 Mailing-List: contact lucene-dev-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Developers List" Reply-To: "Lucene Developers List" Delivered-To: mailing list lucene-dev@jakarta.apache.org Received: (qmail 37915 invoked from network); 1 May 2003 03:28:51 -0000 Received: from smtp16.singnet.com.sg (165.21.6.36) by daedalus.apache.org with SMTP; 1 May 2003 03:28:51 -0000 Received: from mycomputer (bb-203-125-46-160.singnet.com.sg [203.125.46.160]) by smtp16.singnet.com.sg (8.12.9/8.12.9) with SMTP id h413T0A3015696 for ; Thu, 1 May 2003 11:29:00 +0800 Message-Id: <200305010329.h413T0A3015696@smtp16.singnet.com.sg> From: To: Lucene Developers List X-Mailer: PocoMail 2.6 (1006) - Licensed Version Date: Thu, 1 May 2003 11:43:36 +0800 In-Reply-To: Subject: Re: IFilter Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N On Wed, 30 Apr 2003 22:23:38 -0400, Erik Hatcher wrote: >On Wednesday, April 30, 2003, at 06:22 PM, lists@relevanz.com> >wrote: >>>Tokenized? Stored? Should the underlying document handler= make >>>these >>>determinations? >> >>I think so, yes. > >But not field names? :) > >Its mostly a rhetorical question from me, as I'm not sure. My > >Ant task has the DocumentHandler create the Document instances,= but >the >the Ant task itself adds some fields (file system last modified= date >and file path, to allow for dependency checking and rapid= indexing) - > >so there is a bit of both going on. Ok. Probably time to get the terminolgy straight. :-) Instead of IFilter, I propose ContentHandler (I'm not 100% happy= with it, but that's what I'm using now). I'm fine with the use of DocumentHandler (since Indyo uses it anyway). So DocumentHandler= creates Documents and other stuff, and ContentHandler works with= file contents. OK? Basically, if one doesn't have a requirement for specific names= of fields, and ok with leaving it to the respective ContentHandlers,= then it should be alright to do use the populate(Document) method= in the ContentHandler. In other words, if the HTMLContentHandler= calls its title, "HTMLTitle" for instance, and you don't really care,= then all is alright. If you're peeved about it, go ahead and retrieve= the metadata and do your mapping and add to Document via low-level. >> >>I feel a way around this, is by providing both a high- as well= as >>low-level API. The high-level api involves passing the IFilter= a >>Document, and it "does its thing". The low-level API provides= more >>flexibility, with performance and convenience at a tradeoff= (duh). > >Can we agree not to prefix it with "I"? We all have our pet= peeves >with code styles and naming conventions, and that is one of mine= :) > >This design seems fine with me. No objections at all. +1 > > >>>>From client perspective, >>High-level: >>aContentHandler.populate(new Document()); >> >>Low-level: >>Map m =3D aContentHandler.getMetadata(); >>// iterate through map >>Reader r =3D aContentHandler.getReader(); >>// add reader >> >>Do you think this would satisfy 90% of requirements? > >I'm still not seeing the Reader thing - that is to read all the= text >contents of a file, for use in a single field? > Conceptually, I'd like to differentiate contents of the file from= its metadata. I know it may be a little strange sometimes, especially= if some of the metadata comes from the contents, but I think its advantageous to think in this way. Practically, I'm _really_ uncomfortable with placing a Reader in= the metadata Map. Kelvin --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-dev-help@jakarta.apache.org