lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <>
Subject [jira] [Updated] (LUCENE-4127) negative offsets/deltas corrumption
Date Sun, 10 Jun 2012 16:57:42 GMT


Robert Muir updated LUCENE-4127:

    Attachment: LUCENE-4127.patch

Here's a patch: I think its committable (e.g. so we can get alpha release out).

As a followup I think we should enable the docinverter check when termVectorOffsets are enabled,
enable the backwards-offsets check in BaseTokenStreamTestCase, fix the broken analyzers, and
improve the tests some more.

> negative offsets/deltas corrumption
> -----------------------------------
>                 Key: LUCENE-4127
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: core/index
>    Affects Versions: 4.0
>            Reporter: Robert Muir
>         Attachments: LUCENE-4127.patch, LUCENE-4127.patch, LUCENE-4127_offsetAtt.patch,
> If offsets go negative or backwards, it can corrupt the index with DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS:
the offsets will have wrong values (different from the term vectors) or even crazy values
like -2147483645
> The problem with this is that its not just theoretical: its too easy to do this with
lucene's own analyzer chains (e.g. ngramtokenizer).
> See issues such as LUCENE-3920 and some discussion on LUCENE-3738
> The question is how to fix this, e.g. should we:
> # start enforcing that offsets cannot be crazy values in OffsetAttributeImpl/IndexWriter
and fix the broken analyzers
> # leave offsets as a pair of opaque integers, declaring this a limitation of the current
codec, and either workaround or throw UOE from the postings writer.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message