lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] Updated: (LUCENE-1020) Basic tool for checking & repairing an index
Date Fri, 05 Oct 2007 11:39:50 GMT

     [ https://issues.apache.org/jira/browse/LUCENE-1020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Michael McCandless updated LUCENE-1020:
---------------------------------------

    Attachment: LUCENE-1020.patch

Attached patch.

I created a first cut at this.  It takes the path to the index, opens
it, and steps through all segments scanning terms, freq, prox, fields,
norms, stored fields and term vectors.  If it detects anything
inconsistent, and you specified "-fix" on the command-line, it will
write a new segments file that does not reference the bad segments.

WARNING: this is all brand new code.  Be very careful when trying it.
Make a full backup copy of your index first!

It also prints useful details about the index (eg "roughly" what
version of Lucene produced it) which can be used to gather diagnostics
when trying to debug problems with an index.

Below is the output on a healthy index.  On an un-healthy index, the
tool prints 'FAILED' for one or more of the segments and then prints
the full excpeption (reason).  But, nothing is done to the index
unless you specify the '-fix' command-line option.

Healthy index output:


Opening index @ contrib/benchmark/work/index

Segments file=segments_3 numSegments=6 version=FORMAT_SHARED_DOC_STORE [Lucene 2.3]
  1 of 6: name=_l docCount=9039
    compound=false
    numFiles=11
    size (MB)=44.276
    docStoreOffset=0
    docStoreSeEgment=_0
    no deletions
    test: open reader.........OK
    test: fields, norms.......OK [3 fields]
    test: terms, freq, prox...OK [391050 terms; 6573991 terms/docs pairs; 20476680 tokens]
    test: stored fields.......OK [27117 total field count; avg 3 fields per doc]
    test: term vectors........OK [18078 total vector count; avg 2 term/freq vector fields
per doc]

  2 of 6: name=_16 docCount=9193
    compound=false
    numFiles=11
    size (MB)=44.743
    docStoreOffset=9039
    docStoreSeEgment=_0
    no deletions
    test: open reader.........OK
    test: fields, norms.......OK [3 fields]
    test: terms, freq, prox...OK [391013 terms; 6619615 terms/docs pairs; 20746479 tokens]
    test: stored fields.......OK [27579 total field count; avg 3 fields per doc]
    test: term vectors........OK [18386 total vector count; avg 2 term/freq vector fields
per doc]

  3 of 6: name=_1a docCount=3686
    compound=false
    numFiles=11
    size (MB)=11.797
    docStoreOffset=18232
    docStoreSeEgment=_0
    no deletions
    test: open reader.........OK
    test: fields, norms.......OK [3 fields]
    test: terms, freq, prox...OK [164885 terms; 1866591 terms/docs pairs; 5047412 tokens]
    test: stored fields.......OK [11058 total field count; avg 3 fields per doc]
    test: term vectors........OK [5953 total vector count; avg 1.615 term/freq vector fields
per doc]

  4 of 6: name=_1f docCount=3987
    compound=false
    numFiles=11
    size (MB)=11.851
    docStoreOffset=21918
    docStoreSeEgment=_0
    no deletions
    test: open reader.........OK
    test: fields, norms.......OK [3 fields]
    test: terms, freq, prox...OK [159546 terms; 1804415 terms/docs pairs; 5199299 tokens]
    test: stored fields.......OK [11961 total field count; avg 3 fields per doc]
    test: term vectors........OK [7547 total vector count; avg 1.893 term/freq vector fields
per doc]

  5 of 6: name=_1l docCount=838
    compound=false
    numFiles=11
    size (MB)=3.143
    docStoreOffset=28712
    docStoreSeEgment=_0
    no deletions
    test: open reader.........OK
    test: fields, norms.......OK [3 fields]
    test: terms, freq, prox...OK [68824 terms; 436884 terms/docs pairs; 1281678 tokens]
    test: stored fields.......OK [2514 total field count; avg 3 fields per doc]
    test: term vectors........OK [1617 total vector count; avg 1.93 term/freq vector fields
per doc]

  6 of 6: name=_1m docCount=450
    compound=false
    numFiles=11
    size (MB)=2.165
    docStoreOffset=29550
    docStoreSeEgment=_0
    no deletions
    test: open reader.........OK
    test: fields, norms.......OK [3 fields]
    test: terms, freq, prox...OK [53147 terms; 278659 terms/docs pairs; 877940 tokens]
    test: stored fields.......OK [1350 total field count; avg 3 fields per doc]
    test: term vectors........OK [895 total vector count; avg 1.989 term/freq vector fields
per doc]

No problems were detected with this index.


> Basic tool for checking & repairing an index
> --------------------------------------------
>
>                 Key: LUCENE-1020
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1020
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index
>    Affects Versions: 2.3
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>         Attachments: LUCENE-1020.patch
>
>
> This has been requested a number of times on the mailing lists.  Most
> recently here:
>   http://www.gossamer-threads.com/lists/lucene/java-user/53474
> I think we should provide a basic tool out of the box.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message