lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Taher H. Haveliwala" <taher...@yahoo.com>
Subject indexing documents that arrive in pieces
Date Sun, 13 Oct 2002 02:18:34 GMT
What is the cleanest way in Lucene to add documents to
an index, if the entire document is not readily
available at one time?

E.g., I want to index the text as well as the
anchor-text of a stream of html pages, where the
anchor-text terms get associated with the page _being
pointed to_.  For a document d_i, I don't know all the
terms that should be added to its "anchor" field,
until I've seen all documents d_j that link to d_i.

Of course I can make a pass over the web pages, and
gather up the relevant terms myself, but if Lucene has
the necessary machinery to add portions of a document
at different times, it would save me work. 

Thanks
Taher

__________________________________________________
Do you Yahoo!?
Faith Hill - Exclusive Performances, Videos & More
http://faith.yahoo.com

--
To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>


Mime
View raw message