incubator-lucy-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marvin Humphrey <mar...@rectangular.com>
Subject Re: [lucy-dev] Re: which fields contained which terms
Date Fri, 09 Sep 2011 05:45:56 GMT
On Thu, Sep 08, 2011 at 10:25:59PM -0500, Peter Karman wrote:
> Marvin Humphrey wrote on 8/30/11 4:59 PM:
> 
> > To support highlighting, at index-time we create an inverted representation
> > for each field that has been marked as "highlightable", then serialize all the
> > inverted fields together in one blob (called, for no particularly good reason,
> > a "DocVector").  Effectively this is a miniature inverted-index containing a
> > single document.  The class which does the work is
> > Lucy::Index::HighlightWriter, and the relevant segment files are named
> > seg_NNN/highlight.ix and seg_NNN/highlight.dat.
> 
> I finally found time to try this, but I must be doing something wrong, because
> despite setting all my fields to 'highlightable', no seg_NNN/highlight.* files
> are getting created.

The highlight.* files exist as discrete files for a little while during
indexing, but then they get rolled into the compound file.  Take a look in
seg_NNN/cfmeta.json.

> |-- locks
> |-- schema_1.json
> |-- seg_1
> |   |-- cf.dat
> |   |-- cfmeta.json
> |   `-- segmeta.json
> |-- snapshot_1.json
> |-- swish.xml
> `-- swish_last_start

The "cf.dat" file has the data, and the "cfmeta.json" file contains the file
metadata, such as offset and length.

Marvin Humphrey

Mime
View raw message