lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <>
Subject [jira] [Updated] (LUCENE-4196) Turn asserts in I/O related code into hard checks
Date Fri, 31 Aug 2012 03:35:07 GMT


Robert Muir updated LUCENE-4196:

    Attachment: LUCENE-4196.patch

Hi Uwe... tricky issue.

I think in general we were doing pretty good actually: often times various things like assert
value >= 0 after a readVLong (well, thats ok to be an assert since readVLong does a real
check anyways).

But I think we were missing a few key checks that cost nothing and would detect issues:
* when reading things like #fields or #segments, check thats not negative. otherwise the loop
does nothing and it acts like zero.
* things like docCount in the .si really need to be checked they are positive: this file isn't
checksummed (the information used to be in sis which is) and had basically no checks.
* check that we consume entire .si file just like we do with .fnm
* the asserts for checking duplicate field names/numbers in FieldInfos and terms dicts were
a good idea, but a real check for duplicate values is basically free anyway, since Map.put
returns the previous value and its just a null check.
* statistics are a wonderful cheap check in term dicts impls at startup...

Anyway I think this is a good start. I'll wait for review but I'd like to commit this patch
> Turn asserts in I/O related code into hard checks
> -------------------------------------------------
>                 Key: LUCENE-4196
>                 URL:
>             Project: Lucene - Core
>          Issue Type: Task
>          Components: core/index
>    Affects Versions: 4.0-ALPHA
>            Reporter: Uwe Schindler
>             Fix For: 4.0
>         Attachments: LUCENE-4196.patch
> In lots of codecs we only assert, that e.g. some things inside files are correctly in
bounds, which leads to security problems (ok, not as bad as C-Style buffer overflows), but
e.g. allocating a large array after reading a VInt from a file header and then OOM, is a security
issue. So we have to check all those contracts for files as hard checks, especially as a simply
check in most cases dont cost anything (and it costs not more than the assert itsself, as
the assert also takes CPU power, because it needs a check one time on a static final class
> Of course we cannot check values we read when reading postings, but the simple checks
that any postings file has correct header and something like a positive number of elements,
or number of elements < file size,..., a bit-fireld only contains valid bits in StoredFieldsReader,
or non-duplicate filenames (CFS) are very important. We had those checks in 3.x, but in 4.0,
Mike changed all of those to asserts during the flex development (in my opinion with no real

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message