lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joe R <>
Subject Re: How do YOU detect corrupt indexes?
Date Fri, 03 Aug 2007 13:19:40 GMT

We're planning on using encryption at the filesystem level (whole-disk
encryption) and, to be honest, I don't have a mechanism that can produce the
changes I'm talking about.  Neither does my boss, unfortunately ;)  He came
along one day and asked, "how do we know when data changed on disk without us
doing it?" -- and no, I couldn't get a mechanism out of him then.

I've yet to go through LUCENE-737 (and the Nabble thread it refers to.)  I'd
missed it; thanks for the pointer.

Maliciousness is certainly a possibility, but not likely.  Because a lot of the
data we store is sensitive, we've made sure that the system surrounding the
data is secure and that nobody actually has access to the data itself (there's
no root access on these boxes, the one user that can log in is jailed and the
network is "secure".)  What's more, we hold four copies of the index on four
seperate boxes, two each in two geographically seperated data centers, and
whoever wanted to change the data would have to get into both centers and mod
all four copies.  Any hardware-level fault would also have to operate on all
four copies, so that isn't likely, either.

What's most likely is a software fault.  My thought is to have a seperate
service running whose sole purpose is to "check data integrity", whatever that
means, and (hopefully) shares little code with our main service.  Of course, we
still have some third-party code to accomodate (Lucene included, of course) and
while those have been reliable so far, we can't rule out future problems.

I suppose that the main implementation problem here is that comparing the four
copies of the raw index data itself to each other would operate on a LOT of
data.  I was wondering if anyone had had success with an implementation that
operated on individual documents, groups of documents or some other, smaller
group of data.

Thanks again, sorry for leaving the mechanism and encryption details out.


--- Daniel Noll <> wrote:

> On Friday 03 August 2007 16:03:22 Doron Cohen wrote:
> > What is the anticipated cause of corruption? Malicious?
> > Hardware fault? This somewhat reminds of discussions in
> > the list about encrypting the index. See LUCENE-737
> > and a discussion pointed by it. One of the opinions
> > there was that encryption should be handled at a lower
> > level (OS/FS). Wouldn't that hold here as well?
> That's actually a good point.  These days we have filesystems like ZFS which 
> check for corruption automatically.  This should remove a lot of the extra 
> digesting work people would otherwise need to do to ensure consistency.
> Daniel
> -- 
> Daniel Noll
> Nuix Pty Ltd
> Suite 79, 89 Jones St, Ultimo NSW 2007, Australia    Ph: +61 2 9280 0699
> Web:                               Fax: +61 2 9212 6902
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

Building a website is a piece of cake. Yahoo! Small Business gives you all the tools to get

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message