lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: Hardcoded checksum mechanism in BlockTreeTermsReader
Date Tue, 06 Dec 2016 11:30:18 GMT
I see.  Bits can also be flipped by the network as they are travelling
to/from the DB.  The end to end checksum Lucene does now would catch
that.

Anyway, that BlockTree index file that is being entirely checksummed
is a very small file.  And, using the first pattern is not easy for it
because it needs to seek to the end to load its directory location,
and then seek back to that location to read each field's information.
Do you see a simple way to change it to the first pattern?

Mike McCandless

http://blog.mikemccandless.com


On Tue, Dec 6, 2016 at 6:00 AM, Duke DAI <duke.dai.007@gmail.com> wrote:
> Thanks for your quick response, Mike.
>
> Database has its own raw page management over OS page management, and most
> likely database has its own checksum on page level, that's why I want to
> avoid checksum in Lucene Directory level.
>
> Certainly checksum is good, I like the pattern(rewrite openChecksumInput
> according to real case):
> inputStream = directory.openChecksumInput(...);
> // at the end check checksum, as by-product
> CodecUtil.checkFooter(...)
>
> But I do not like the pattern:
> CodecUtil.checksumEntireFile(..), its purpose is pure checksum via reading
> all data, not the by-product.
> If the design/API is pluggable with default way, it'll be good enough for
> various scenario.
>
>
>
>
> Best regards,
> Duke
> If not now, when? If not me, who?
>
> On Tue, Dec 6, 2016 at 6:36 PM, Michael McCandless
> <lucene@mikemccandless.com> wrote:
>>
>> We have learned over time not to trust the underlying store to
>> correctly record the bytes we wrote to it.
>>
>> This is why checksumming is very strongly built into Lucene at this
>> point.  If you disable checksumming, when bits do flip, you get exotic
>> exceptions at search time that might look like Lucene bugs and can
>> cost a lot of time to explain.
>>
>> It's not just the BlockTreeTermsReader; many other codec components
>> check the checksum with CodecUtil.checkFooter at search time.
>>
>> Can you explain why it's necessary to remove it for your database
>> files based Directory?
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>>
>> On Tue, Dec 6, 2016 at 5:25 AM, Duke DAI <duke.dai.007@gmail.com> wrote:
>> > Hi all,
>> >
>> > I'm customizing Lucene Directory, which extends o.a.l.store.Directory
>> > based
>> > on database files. I do not need checksum again on IndexIndex and
>> > IndexOutput.
>> >
>> > But in BlockTreeTermsReader constructor, following code open a
>> > hard-coded BufferedChecksumIndexInput to checksum on raw IndexInput. I
>> > have
>> > to use CRC32 on IndexOutput to make through it. Is there any more
>> > graceful
>> > way to do checksum, such as let Directory construct a checksum instance
>> > instead of API Directory.openChecksumInput ?
>> >
>> >
>> >       String indexName = IndexFileNames.segmentFileName(segment,
>> > state.segmentSuffix, TERMS_INDEX_EXTENSION);
>> >       indexIn = state.directory.openInput(indexName, state.context);
>> >       CodecUtil.checkIndexHeader(indexIn, TERMS_INDEX_CODEC_NAME,
>> > version,
>> > version, state.segmentInfo.getId(), state.segmentSuffix);
>> >       CodecUtil.checksumEntireFile(indexIn);
>> >
>> >
>> >
>> >
>> > Best regards,
>> > Duke
>> > If not now, when? If not me, who?
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message