lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Taher H. Haveliwala" <>
Subject indexing documents that arrive in pieces
Date Sun, 13 Oct 2002 02:18:34 GMT
What is the cleanest way in Lucene to add documents to
an index, if the entire document is not readily
available at one time?

E.g., I want to index the text as well as the
anchor-text of a stream of html pages, where the
anchor-text terms get associated with the page _being
pointed to_.  For a document d_i, I don't know all the
terms that should be added to its "anchor" field,
until I've seen all documents d_j that link to d_i.

Of course I can make a pass over the web pages, and
gather up the relevant terms myself, but if Lucene has
the necessary machinery to add portions of a document
at different times, it would save me work. 


Do you Yahoo!?
Faith Hill - Exclusive Performances, Videos & More

To unsubscribe, e-mail:   <>
For additional commands, e-mail: <>

View raw message