lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ravikumar Govindarajan <ravikumar.govindara...@gmail.com>
Subject Re: IndexWriter flush/commit exception
Date Thu, 19 Dec 2013 04:34:18 GMT
> You could make a custom Dir wrapper that always caches in RAM, but
> that sounds a bit terrifying :)

This was exactly what I implemented:) A commit-thread runs periodically
every 30 seconds, while RAM-Monitor thread runs every 5 seconds to commit
data in-case sizeInBytes>=70%-of-maxCachedBytes. This is quite dangerous as
you have said, especially when sync() can take an arbitrary amount of time

> Alternatively, maybe on an HDFS error you could block that one thread
> while you retry for some amount of time, until the write/read
> succeeds?  (Like an NFS hard mount).

Well, after your idea I started digging HDFS for this problem. I believe
HDFS handles this internally without a snitch, as per this link.
https://www.inkling.com/read/hadoop-definitive-guide-tom-white-3rd/chapter-3/data-flow

I believe in the case of a node failure while writing, even an IOException
is also not thrown to the client and all of it is handled internally. I
think I can rest-easy on this.
May be will write a test-case to verify this behavior.

Sorry for the trouble. Should have done some digging before-hand.

--
Ravi

On Wed, Dec 18, 2013 at 11:55 PM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> On Wed, Dec 18, 2013 at 3:15 AM, Ravikumar Govindarajan
> <ravikumar.govindarajan@gmail.com> wrote:
> > Thanks Mike for a great explanation on Flush IOException
>
> You're welcome!
>
> > I was thinking on the perspective of a HDFSDirectory. In addition to the
> > all causes of IOException during flush you have listed, a HDFSDirectory
> > also has to deal with network issues, which is not lucene's problem at
> all.
> >
> > But I would ideally like to handle momentary network blips, as these are
> > fully recoverable errors.
> >
> >
> > Will NRTCachingDirectory help in case of HDFSDirectory? If all goes
> well, I
> > should always flush to RAM and sync to HDFS happens only during commits.
> In
> > such cases, I can have a retry logic inside sync() method for handling
> > momentary IOExceptions
>
> I'm not sure it helps, because on merge, if the expected size of the
> merge segment is large enough, NRTCachingDir won't cache those files:
> it just delegates directly to the wrapped directory.
>
> Likewise, if too much RAM is already in use, flushing a new segment
> would go straight to the wrapped directory.
>
> You could make a custom Dir wrapper that always caches in RAM, but
> that sounds a bit terrifying :)
>
> Alternatively, maybe on an HDFS error you could block that one thread
> while you retry for some amount of time, until the write/read
> succeeds?  (Like an NFS hard mount).
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message