lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <hossman_luc...@fucit.org>
Subject RE: Compound / non-compound index files and SIGKILL
Date Tue, 06 Jun 2006 19:12:56 GMT

1) have you tried forcing a threaddump of the JVM when it hangs to see
what it's doing?  (i don't remember which signal it is off the top of my
head, but even if it's not responding to SIGTERM it might respond to that)

: SIGTERM. I guess I'd feel more confident about using SIGKILL, if I knew that
: the uninterruptible hanged thread was creating a Document, which I could
: interrupt without corrupting the index, rather than adding the document to
: the index, which is liable to result in orphaned files and/or a corrupted
: index, if killed.

2) It's possible that the thread is doing both (creating and adding) at
the same time ... if you are Constructing documents using Fields that
contain Readers you get back from convertors which stream data from
complex documents as needed, then DocumentWriter may have started to write
your document, gotten to a Field with a Reader, and then your convertor
may be choking on something within the source document while it tries to
stream data to that Reader.


	...just a theory.


: -----Original Message-----
: From: Volodymyr Bychkoviak [mailto:vbychkoviak@i-hypergrid.com]
: Sent: 06 June 2006 10:54
: To: java-user@lucene.apache.org
: Subject: Re: Compound / non-compound index files and SIGKILL
:
: If your content handlers should respond quickly then you should move
: indexing process to separate thread and maintain items in queue.
:
: Rob Staveley (Tom) wrote:
: > This is a real eye-opener, Volodymyr. Many thanks. I guess that means
: > that my orphan-producing hangs must be addDocument() calls, and not in
: > the content handlers, as I'd previously assumed. I'll put some debug
: > before and after my addDocument() calls to confirm (and point my
: > writer's infoStream to System.out).
: >
: > -----Original Message-----
: > From: Volodymyr Bychkoviak [mailto:vbychkoviak@i-hypergrid.com]
: > Sent: 05 June 2006 18:33
: > To: java-user@lucene.apache.org
: > Subject: Re: Compound / non-compound index files and SIGKILL
: >
: > Hi.
: > My five cents :)
: >
: > It might be helpful to know how lucene is working with compound files.
: > When segment is flushed to disk it is written uncompound and after
: > that is merged into single .cfs file. If you don't change default
: > setting for using compound files (which is on) this is only place (I
: > guess) for these files to appear.
: >
: > If you're working with large indexes, than merging segments can take a
: > while (Maybe here is your problem? :) ) (merging happens on
: > addDocument() call).  If you will kill indexing process during such
: > merge you'll get many orphaned files...
: >
: > You can just run optimize on this index. You'll get three files:
: > segments, deletable, <segment>.cfs; you can look name of segment in
: > 'segments' file. Everything else is 'garbage' - you can delete it.
: >
: >
: > Rob Staveley (Tom) wrote:
: >
: >> I've been indexing live data into a compound index from an MTA. I'm
: >> resolving a bunch of problems unrelated to Lucene (disparate hangs in
: >> my content handlers). When I get a hang, I typically need to kill my
: >> daemon, alas more often than not using kill -9 (SIGKILL).
: >>
: >> However, these SIGKILLs are leaving large temporary(?) files, which I
: >>
: > guess
: >
: >> are non-compound index files transiently extracted from the working
: >> .cfs
: >> files:
: >>
: >> -rw-r--r--    1  373138432 Jun  2 13:42 _18hup.fdt
: >> -rw-r--r--    1      5054464 Jun  2 13:42 _18hup.fdx
: >> -rw-r--r--    1              426 Jun  2 13:42 _18hup.fnm
: >>
: >> -rw-r--r--    1  457253888 Jun  2 09:22 _15djq.fdt
: >> -rw-r--r--    1      6205440 Jun  2 09:22 _15djq.fdx
: >> -rw-r--r--    1              426 Jun  2 09:21 _15djq.fnm
: >>
: >> They are left intact after restarting my daemon. Presumably they are
: >> not treated as being part of the compound index. I see no
: >> corresponding .cfs file for them.
: >>
: >> As a consequence of these - I suspect - I am getting a very large
: >> overall disk requirement for my index, presumably because of
: >> replicated field
: >>
: > data.
: >
: >> My guess is that the field data in the orphaned .fdt files needs to
: >> be regenerated.
: >>
: >> In another index directory from a previous test run (again with
: >> SIGKILLs),
: >>
: > I
: >
: >> have 98 GB of index files, with only 12 BG devoted to compound files
: >> for
: >>
: > the
: >
: >> field index (.cfs). The rest of the disk space is used by orphaned
: >> uncompounded index files; I see 51 GB devoted to uncompounded field
: >> data (.fdt), 13 BG devoted to term positions (.prx) and 13 BG devoted
: >> to term frequencies (.frq).
: >>
: >> Here's my question:
: >>
: >> How can I attempt to merge these orphaned into the compound index,
: >> using IndexWriter.addIndexes(), or would I be foolish attempting this?
: >>
: >>
: >
: >
:
: --
: regards,
: Volodymyr Bychkoviak
:
:



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message