lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "J.J. Larrea" <>
Subject Re: adding "modes" to the <add> command
Date Thu, 11 Jan 2007 19:14:37 GMT
At 6:43 AM -0500 1/11/07, Erik Hatcher wrote:
>If all fields are stored, the implementation could simply pull them all into memory on
the Solr side and add the document as if it had been sent entirely by the client.  But, what
happens when for un-stored fields?

I'll observe that Luke has a "Reconstruct and Edit" function which displays the indexed values
for each field for the selected Document when stored values aren't available... it iterates
the entire inverted index and intersects each term position vector with the target Document
ID via TermPositions.skipTo(id).

While that would be too slow to do on a per-update basis, it might be feasible for an update
function if it cached a list of partially defined Documents and only at the end (at closing
or whenever the list grew past a defined maximum) did a bulk intersection to find indexed
values which are not overridden with new values, with just a single traversal of the index
in Term then updated DocID order.  Once done the reconstructed Documents could be added and
the prior versions deleted.

The roadblocks come up when re-adding the indexed values to the index: while the updater can
create a new untokenized unstored Field for each indexed value so it is literally re-added,
in that case there is no way to externally specify the position offset to match the original.
 DocumentWriter and the classes it relies on are package-private and final, so no way to interpose
there.  But an effective hack might be to set the reconstructed Fields to tokenized but specify
for those fields a special Analyzer which acts like Keyword Analyzer but looks up the position
offset in a table created by the update mechanism and returns it with the token.  A little
convoluted but probably doable if someone had the time and inclination.

- J.J.

View raw message