lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Smiley (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-4658) Per-segment tracking of external/side-car data
Date Sun, 17 Mar 2013 17:41:15 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-4658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13604683#comment-13604683
] 

David Smiley commented on LUCENE-4658:
--------------------------------------

You raise a good point there Rob; BinaryDocValues is pretty close and might be sufficient
as-is.  But do we need segment based tracking hooks?  Perhaps it's useful for parallel / overlay
indexes that maintain docid consistency (LUCENE-4258 ?), but I don't think that needs to be
centered around any particular special field.  Shai's issue description points to a comment
I made but it was in turn a quote of Rob.  Rob & I didn't call out a need for segment
level tracking; it was commit level tracking.  A couple use-cases I had in mind when I made
the comment are:

* Storing per-document data that changes often like the number of clicks/accesses to the search
result -- ultimately used to influence scoring.  The application's backing store would probably
be an in-memory cache with occasional syncs to disk.
* Storing a large per-document body text in an external data source (e.g. a DB or file system).
 Lucene needlessly merges stored fields which I think is quite wasteful, not to mention putting
it in Lucene is redundant if you already manage it somewhere else.  It's ultimately needed
via Lucene's API for highlighting.

Is per-segment tracking needed for this?  Or is this really about hooks to enable a parallel
segment level index?  I dunno.

                
> Per-segment tracking of external/side-car data
> ----------------------------------------------
>
>                 Key: LUCENE-4658
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4658
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>         Attachments: LUCENE-4658.patch, LUCENE-4658.patch
>
>
> Spinoff from David's idea on LUCENE-4258
> (https://issues.apache.org/jira/browse/LUCENE-4258?focusedCommentId=13534352&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13534352
)
> I made a prototype patch that allows custom per-segment "side-car
> data".  It adds an abstract ExternalSegmentData class.  The idea is
> the app implements this, and IndexWriter will pass each Document
> through to it, and call on it to do flushing/merging.  I added a
> setter to IndexWriterConfig to enable it, but I think this would
> really belong in Codec ...
> I haven't tackled the read-side yet, though this is already usable
> without that (ie, the app can just open its own files, read them,
> etc.).
> The random test case passes.
> I think for example this might make it easier for Solr/ElasticSearch
> to implement things like ExternalFileField.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message