lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: Hardcoded checksum mechanism in BlockTreeTermsReader
Date Tue, 06 Dec 2016 13:39:00 GMT
Hi,

The checksum is also written for a second reason: Java VMs often have optimization bugs (you
may know the Java 7 GA disaster and Java 7u40 vector optimization bugs that Lucene discovered).
The checksums will often catch those bugs, too.

Uwe

-----
Uwe Schindler
Achterdiek 19, D-28357 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

> -----Original Message-----
> From: Michael McCandless [mailto:lucene@mikemccandless.com]
> Sent: Tuesday, December 6, 2016 12:30 PM
> To: Duke DAI <duke.dai.007@gmail.com>
> Cc: Lucene Users <java-user@lucene.apache.org>
> Subject: Re: Hardcoded checksum mechanism in BlockTreeTermsReader
> 
> I see.  Bits can also be flipped by the network as they are travelling
> to/from the DB.  The end to end checksum Lucene does now would catch
> that.
> 
> Anyway, that BlockTree index file that is being entirely checksummed
> is a very small file.  And, using the first pattern is not easy for it
> because it needs to seek to the end to load its directory location,
> and then seek back to that location to read each field's information.
> Do you see a simple way to change it to the first pattern?
> 
> Mike McCandless
> 
> http://blog.mikemccandless.com
> 
> 
> On Tue, Dec 6, 2016 at 6:00 AM, Duke DAI <duke.dai.007@gmail.com>
> wrote:
> > Thanks for your quick response, Mike.
> >
> > Database has its own raw page management over OS page management,
> and most
> > likely database has its own checksum on page level, that's why I want to
> > avoid checksum in Lucene Directory level.
> >
> > Certainly checksum is good, I like the pattern(rewrite openChecksumInput
> > according to real case):
> > inputStream = directory.openChecksumInput(...);
> > // at the end check checksum, as by-product
> > CodecUtil.checkFooter(...)
> >
> > But I do not like the pattern:
> > CodecUtil.checksumEntireFile(..), its purpose is pure checksum via reading
> > all data, not the by-product.
> > If the design/API is pluggable with default way, it'll be good enough for
> > various scenario.
> >
> >
> >
> >
> > Best regards,
> > Duke
> > If not now, when? If not me, who?
> >
> > On Tue, Dec 6, 2016 at 6:36 PM, Michael McCandless
> > <lucene@mikemccandless.com> wrote:
> >>
> >> We have learned over time not to trust the underlying store to
> >> correctly record the bytes we wrote to it.
> >>
> >> This is why checksumming is very strongly built into Lucene at this
> >> point.  If you disable checksumming, when bits do flip, you get exotic
> >> exceptions at search time that might look like Lucene bugs and can
> >> cost a lot of time to explain.
> >>
> >> It's not just the BlockTreeTermsReader; many other codec components
> >> check the checksum with CodecUtil.checkFooter at search time.
> >>
> >> Can you explain why it's necessary to remove it for your database
> >> files based Directory?
> >>
> >> Mike McCandless
> >>
> >> http://blog.mikemccandless.com
> >>
> >>
> >> On Tue, Dec 6, 2016 at 5:25 AM, Duke DAI <duke.dai.007@gmail.com>
> wrote:
> >> > Hi all,
> >> >
> >> > I'm customizing Lucene Directory, which extends o.a.l.store.Directory
> >> > based
> >> > on database files. I do not need checksum again on IndexIndex and
> >> > IndexOutput.
> >> >
> >> > But in BlockTreeTermsReader constructor, following code open a
> >> > hard-coded BufferedChecksumIndexInput to checksum on raw
> IndexInput. I
> >> > have
> >> > to use CRC32 on IndexOutput to make through it. Is there any more
> >> > graceful
> >> > way to do checksum, such as let Directory construct a checksum instance
> >> > instead of API Directory.openChecksumInput ?
> >> >
> >> >
> >> >       String indexName = IndexFileNames.segmentFileName(segment,
> >> > state.segmentSuffix, TERMS_INDEX_EXTENSION);
> >> >       indexIn = state.directory.openInput(indexName, state.context);
> >> >       CodecUtil.checkIndexHeader(indexIn, TERMS_INDEX_CODEC_NAME,
> >> > version,
> >> > version, state.segmentInfo.getId(), state.segmentSuffix);
> >> >       CodecUtil.checksumEntireFile(indexIn);
> >> >
> >> >
> >> >
> >> >
> >> > Best regards,
> >> > Duke
> >> > If not now, when? If not me, who?
> >
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message