lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: Problems with IndexWriter#commit() on Linux
Date Wed, 10 Feb 2010 12:01:12 GMT
Yes.

Mike

On Wed, Feb 10, 2010 at 6:36 AM, Naama Kraus <naamakraus@gmail.com> wrote:
> Do you mean by calling
>
> IndexWriter#*setInfoStream*(PrintStream
> <http://java.sun.com/j2se/1.5/docs/api/java/io/PrintStream.html>
> infoStream)
>
> ?
>
> Naama
>
>
> On Mon, Feb 8, 2010 at 3:22 PM, Michael McCandless <
> lucene@mikemccandless.com> wrote:
>
>> Hmmm... I think that means you're using the default data mode
>> (ordered), which should properly preserve writes if the OS or machine
>> crashes.
>>
>> And actually I was wrong before -- even if the mount had
>> data=writeback, since you are "only" kill -9ing the process (not
>> crashing the machine), the data mount option doesn't matter.  That
>> option only affects what happens on a crash...
>>
>> Can you work up a small example showing the problem?  And if possible,
>> turn on IndexWriter's infoStream, capture the output as you index up
>> until the kill -9, and post that?
>>
>> Mike
>>
>> On Mon, Feb 8, 2010 at 3:57 AM, Michael McCandless
>> <lucene@mikemccandless.com> wrote:
>> > Thanks for sharing...
>> >
>> > Software RAID should be perfectly fine for Lucene, in general, unless
>> > the mount is configured to ignore fsync (I think the "data=writeback"
>> > mount option for ext3 does so on Linux).
>> >
>> > Can you check the mount options on your RAID filesystem?
>> >
>> > Mike
>> >
>> > On Mon, Feb 8, 2010 at 2:09 AM, Naama Kraus <naamakraus@gmail.com>
>> wrote:
>> >> Hi All,
>> >>
>> >> I am back to this one after some while.
>> >> It appears the file system I was using resides on software RAID disks. I
>> ran
>> >> the same code on the same Linux machine, but on another file system
>> residing
>> >> on SCSI disks. I didn't observe the problem there.
>> >> Both file systems are ext3.
>> >> So I am guessing the problem relates to the RAID disks.
>> >>
>> >> I looked again at commit() API, and the following comment may be
>> explaining:
>> >>
>> >> "Note that this operation calls Directory.sync on the index files. That
>> call
>> >> should not return until the file contents & metadata are on stable
>> storage.
>> >> For FSDirectory, this calls the OS's fsync. But, beware: some hardware
>> >> devices may in fact cache writes even during fsync, and return before
>> the
>> >> bits are actually on stable storage, to give the appearance of faster
>> >> performance. If you have such a device, and it does not have a battery
>> >> backup (for example) then on power loss it may still lose data. Lucene
>> >> cannot guarantee consistency on such devices."
>> >>
>> >> Well, for me, running on the SCSI disks is just fine, I wanted to anyway
>> >> share my experience.
>> >>
>> >> Naama
>> >>
>> >> On Fri, Jan 8, 2010 at 12:09 AM, Naama Kraus <naamakraus@gmail.com>
>> wrote:
>> >>
>> >>> Thanks all for the hints, I'll get back to my code and do some
>> additional
>> >>> checks.
>> >>> Naama
>> >>>
>> >>>
>> >>> On Thu, Jan 7, 2010 at 6:57 PM, Michael McCandless <
>> >>> lucene@mikemccandless.com> wrote:
>> >>>
>> >>>> kill -9 is harsh, but, perfectly fine from Lucene's standpoint.
>> >>>> Likewise if the OS or JVM crashes, power is suddenly lost, the index
>> >>>> will just fallback to the last successful commit.  What will cause
>> >>>> corruption is if you have bit errors happening somewhere in the
>> >>>> machine... or if two writers are accidentally allowed to be open
on
>> >>>> one index... then you're in trouble.
>> >>>>
>> >>>> What IO system (filesystem & hardware) are you using on Linux?
>> >>>> Boiling down to a smallish test case can help to isolate the
>> >>>> problem...
>> >>>>
>> >>>> Mike
>> >>>>
>> >>>> On Thu, Jan 7, 2010 at 11:51 AM, Erick Erickson <
>> erickerickson@gmail.com>
>> >>>> wrote:
>> >>>> > Can you show us the code where you commit?
>> >>>> >
>> >>>> > And how do you kill your process? Kill -9 is...er...harsh....
>> >>>> >
>> >>>> > Yeah, I'm wondering whether the index file size *stays*
>> >>>> > changed after you kill you process. If it keeps its
>> >>>> > growing on every run (after you kill your process
>> >>>> > multiple times), then I'd suspect that you aren't
>> >>>> > adding documents like you think you are. Perhaps
>> >>>> > different fields, different analyzers, etc.
>> >>>> >
>> >>>> > Luke should show you the largest document by ID,
>> >>>> > as well as document counts. Comparing changes
>> >>>> > in the document count and the max doc ID should
>> >>>> > tell you something...
>> >>>> >
>> >>>> > Is it possible that you are updating existing docs
>> >>>> > rather than adding new ones?
>> >>>> >
>> >>>> > Best
>> >>>> > Erick
>> >>>> >
>> >>>> > On Thu, Jan 7, 2010 at 10:41 AM, Naama Kraus <naamakraus@gmail.com>
>> >>>> wrote:
>> >>>> >
>> >>>> >> Thanks dor the input.
>> >>>> >>
>> >>>> >> 1. While the process is running, I do see the index files
growing
>> on
>> >>>> disk
>> >>>> >> and the time stamps changing. Should I see a change in
size right
>> after
>> >>>> >> killing the process, is that what you mean ?
>> >>>> >> 2. Yes, same directory is being used for indexing and search.
>> >>>> >> 3. Didn't try Luke, good idea. Though I wonder, the same
code runs
>> well
>> >>>> on
>> >>>> >> Windows.
>> >>>> >>
>> >>>> >> Naama
>> >>>> >>
>> >>>> >> On Thu, Jan 7, 2010 at 3:37 PM, Erick Erickson <
>> >>>> erickerickson@gmail.com
>> >>>> >> >wrote:
>> >>>> >>
>> >>>> >> > Several questions:
>> >>>> >> > 1> are the index files larger after you kill your
process?
>> >>>> >> >    Or have the timestamps changed?
>> >>>> >> > 2> are you absolutely sure that your indexer, when
you
>> >>>> >> >     add documents, is pointing at the same directory
your
>> >>>> >> >     search is pointing to?
>> >>>> >> > 3> Have you gotten a copy of Luke and examined
your index
>> >>>> >> >     to see if, perhaps, your documents aren't being
added the
>> >>>> >> >     way you think they are?
>> >>>> >> >
>> >>>> >> > Erick
>> >>>> >> >
>> >>>> >> > On Thu, Jan 7, 2010 at 7:13 AM, Naama Kraus <
>> naamakraus@gmail.com>
>> >>>> >> wrote:
>> >>>> >> >
>> >>>> >> > > Hi,
>> >>>> >> > >
>> >>>> >> > > I am using IndexWriter#commit() methods in my
program to commit
>> >>>> >> document
>> >>>> >> > > additions to the index. I do that once in a while,
after a
>> bunch of
>> >>>> >> > > documents were added. Since my indexing process
is long, I want
>> to
>> >>>> make
>> >>>> >> > > sure
>> >>>> >> > > I don't loose too many additions in case of a
crash.
>> >>>> >> > > When running on Windows, things work as expected.
But when
>> running
>> >>>> my
>> >>>> >> > code
>> >>>> >> > > on Linux, seems like commit() has no effect.
If I kill my
>> program
>> >>>> and
>> >>>> >> > then
>> >>>> >> > > restart it, I don't see documents that I added
and then
>> committed
>> >>>> (they
>> >>>> >> > are
>> >>>> >> > > not returned by a search operation).
>> >>>> >> > > I am running Lucene 3.0.0
>> >>>> >> > >
>> >>>> >> > > Can anyone help ?
>> >>>> >> > >
>> >>>> >> > > Thanks, Naama
>> >>>> >> > >
>> >>>> >> > > --
>> >>>> >> > > "If you want your children to be intelligent,
read them fairy
>> >>>> tales. If
>> >>>> >> > you
>> >>>> >> > > want them to be more intelligent, read them more
fairy tales."
>> >>>> >> > > "What really interests me is whether God had
any choice in the
>> >>>> creation
>> >>>> >> > of
>> >>>> >> > > the world."
>> >>>> >> > > (Albert Einstein)
>> >>>> >> > >
>> >>>> >> >
>> >>>> >>
>> >>>> >>
>> >>>> >>
>> >>>> >> --
>> >>>> >> "If you want your children to be intelligent, read them
fairy
>> tales. If
>> >>>> you
>> >>>> >> want them to be more intelligent, read them more fairy
tales."
>> >>>> >> "What really interests me is whether God had any choice
in the
>> creation
>> >>>> of
>> >>>> >> the world."
>> >>>> >> (Albert Einstein)
>> >>>> >>
>> >>>> >
>> >>>>
>> >>>> ---------------------------------------------------------------------
>> >>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> >>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>> >>>>
>> >>>>
>> >>>
>> >>>
>> >>> --
>> >>> "If you want your children to be intelligent, read them fairy tales.
If
>> you
>> >>> want them to be more intelligent, read them more fairy tales."
>> >>> "What really interests me is whether God had any choice in the creation
>> of
>> >>> the world."
>> >>> (Albert Einstein)
>> >>>
>> >>
>> >>
>> >>
>> >> --
>> >> "If you want your children to be intelligent, read them fairy tales. If
>> you
>> >> want them to be more intelligent, read them more fairy tales."
>> >> "What really interests me is whether God had any choice in the creation
>> of
>> >> the world."
>> >> "A table, a chair, a bowl of fruit and a violin; what else does a man
>> need
>> >> to be happy? "
>> >> (Albert Einstein)
>> >>
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
>
> --
> "If you want your children to be intelligent, read them fairy tales. If you
> want them to be more intelligent, read them more fairy tales."
> "What really interests me is whether God had any choice in the creation of
> the world."
> "A table, a chair, a bowl of fruit and a violin; what else does a man need
> to be happy? "
> (Albert Einstein)
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message