lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew C. Oliver" <acoli...@nc.rr.com>
Subject RE: Format Stripping [ was: XLS parser ]
Date Tue, 22 Jan 2002 19:16:04 GMT
I'll come up with a small demonstration shortly.

-Andy

On Tue, 2002-01-22 at 12:39, Doug Cutting wrote:
> > From: Andrew C. Oliver [mailto:acoliver@nc.rr.com]
> >
> > We've implemented an event based
> > system for reading documents (so you register for what you care about
> > and then kick it off and it throws events to listeners as it runs into
> > them).  Not sure if there is a clean way to graft those ideas onto
> > Lucene for a single pass read.
> 
> I'm not sure the metaphor is apt.  The listener pattern is used with
> parsers.  Lucene is not a parser, but rather something that you'd like to
> call from a parser.
> 
> For example, one might do something like the following to add text to a
> Lucene index with a SAX parser:
> 
>   parser.setContentHandler(new ContentHandler() {
>     private Document document = new Document();
>     private String fieldName;
>     public void startElement(String ns, String name, ...) {
>       fieldName = name;
>     }
>     public void characters(char[] chars, int start, int len) {
>       String text = new String(chars, start, len);
>       document.add(Field.UnStored(fieldName, text);
>     }    
>   });
> 
> (Note that in Lucene a given field name may be added to a Document many
> times, with the effect of appending the contained text chunks in the index.
> The only proviso is that tokens will not span chunk boundaries.)
> 
> This code seems completely natural to me.  I'm not sure how an event-based
> indexer would look in this context.
> 
> Doug
> 
> --
> To unsubscribe, e-mail:   <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>
> 
-- 
www.superlinksoftware.com
www.sourceforge.net/projects/poi - port of Excel format to java
http://developer.java.sun.com/developer/bugParade/bugs/4487555.html 
			- fix java generics!


The avalanche has already started. It is too late for the pebbles to
vote.
-Ambassador Kosh


--
To unsubscribe, e-mail:   <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>


Mime
View raw message