lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marvin Humphrey <mar...@rectangular.com>
Subject Re: Flexible indexing
Date Mon, 12 Mar 2007 20:49:52 GMT

On Mar 10, 2007, at 3:27 PM, Michael Busch wrote:

> - Introduce index-level metadata. Preferable in XML format, so it  
> will be human readable. Later on, we can store information about  
> the index format in this file, like the codecs that are used to  
> store the data.

To provoke thought about what index-level metadata might go in this  
file, the contents of a KS "segments_2.yaml" file immediately after  
indexing an html presentation of the US constitution is below.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/


slothbear:~/projects/ks/perl marvin$ cat uscon_invindex/segments_2.yaml
ks_version: 0.20_02
fields:
   title: 'KinoSearch::Schema::FieldSpec'
   url: 'USConSchema::UnIndexedField'
   content: 'KinoSearch::Schema::FieldSpec'
format: 1
generation: 2
seg_counter: 1
segments:
   _1:
     term_list_index:
       skip_interval: 16
       format: 1
       index_interval: 128
       size: 8
       counts:
         title: 1
         content: 8
     posting_list:
       format: 1
     compound_file:
       format: 1
       sub_files:
         _1.tlx2:
           offset: 138575
           length: 93
         _1.p0:
           offset: 138134
           length: 441
         _1.tvx:
           offset: 137718
           length: 416
         _1.tv:
           offset: 73487
           length: 64231
         _1.tl0:
           offset: 73259
           length: 228
         _1.p2:
           offset: 56393
           length: 16866
         _1.ds:
           offset: 7015
           length: 49378
         _1.tl2:
           offset: 421
           length: 6594
         _1.dsx:
           offset: 5
           length: 416
         _1.tlx0:
           offset: 0
           length: 5
     term_vectors:
       format: 1
     term_list:
       skip_interval: 16
       format: 1
       index_interval: 128
       size: 923
       counts:
         title: 41
         content: 923
     doc_storage:
       format: 1
     seg_info:
       seg_name: _1
       doc_count: 52
       field_names:
         - title
         - url
         - content
version: 1173732193033



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message