lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Duke DAI <duke.dai....@gmail.com>
Subject Re: Hardcoded checksum mechanism in BlockTreeTermsReader
Date Tue, 06 Dec 2016 11:00:52 GMT
Thanks for your quick response, Mike.

Database has its own raw page management over OS page management, and most
likely database has its own checksum on page level, that's why I want to
avoid checksum in Lucene Directory level.

Certainly checksum is good, I like the pattern(rewrite openChecksumInput
according to real case):
inputStream = directory.openChecksumInput(...);
// at the end check checksum, as by-product
CodecUtil.checkFooter(...)

But I do not like the pattern:
CodecUtil.checksumEntireFile(..), its purpose is pure checksum via reading
all data, not the by-product.
If the design/API is pluggable with default way, it'll be good enough for
various scenario.




Best regards,
Duke
If not now, when? If not me, who?

On Tue, Dec 6, 2016 at 6:36 PM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> We have learned over time not to trust the underlying store to
> correctly record the bytes we wrote to it.
>
> This is why checksumming is very strongly built into Lucene at this
> point.  If you disable checksumming, when bits do flip, you get exotic
> exceptions at search time that might look like Lucene bugs and can
> cost a lot of time to explain.
>
> It's not just the BlockTreeTermsReader; many other codec components
> check the checksum with CodecUtil.checkFooter at search time.
>
> Can you explain why it's necessary to remove it for your database
> files based Directory?
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Tue, Dec 6, 2016 at 5:25 AM, Duke DAI <duke.dai.007@gmail.com> wrote:
> > Hi all,
> >
> > I'm customizing Lucene Directory, which extends o.a.l.store.Directory
> based
> > on database files. I do not need checksum again on IndexIndex and
> > IndexOutput.
> >
> > But in BlockTreeTermsReader constructor, following code open a
> > hard-coded BufferedChecksumIndexInput to checksum on raw IndexInput. I
> have
> > to use CRC32 on IndexOutput to make through it. Is there any more
> graceful
> > way to do checksum, such as let Directory construct a checksum instance
> > instead of API Directory.openChecksumInput ?
> >
> >
> >       String indexName = IndexFileNames.segmentFileName(segment,
> > state.segmentSuffix, TERMS_INDEX_EXTENSION);
> >       indexIn = state.directory.openInput(indexName, state.context);
> >       CodecUtil.checkIndexHeader(indexIn, TERMS_INDEX_CODEC_NAME,
> version,
> > version, state.segmentInfo.getId(), state.segmentSuffix);
> >       CodecUtil.checksumEntireFile(indexIn);
> >
> >
> >
> >
> > Best regards,
> > Duke
> > If not now, when? If not me, who?
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message