lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shouvik Bardhan <sbard...@kinetica.com>
Subject Index file deleting....
Date Tue, 21 Nov 2017 15:03:18 GMT
Apologies if this has been discussed and thrashed out before. I found some
discussion but still not clear about several things. Based on one of Mike's
answers a while back, I have ran my test program with a lucene-core jar
which was built with VERBOSE_REF_COUNTS = true. This is all on Lucene 6.6.2
on Centos.

I have a small test program whose structure if roughly like so

try {
FSDirectory dir = FSDirectory.open(Paths.get(INDEX_DIR));
IndexWriterConfig iwc = new IndexWriterConfig(new StandardAnalyzer());
TieredMergePolicy tmp = new TieredMergePolicy();
tmp.setSegmentsPerTier(10);
iwc.setInfoStream(System.out);
iwc.setMergePolicy(tmp);
iwc.setOpenMode(IndexWriterConfig.OpenMode.CREATE_OR_APPEND);

IndexWriter writer = new IndexWriter(dir, iwc);

for( int madness = 0; madness < 2; madness++ ) {

indexDocs(writer, new File(args[0]));

SearcherManager sm = new SearcherManager(writer, new SearcherFactory());
IndexSearcher is = sm.acquire();
System.out.println(" Bad news Ref1 cnt is " +
is.getIndexReader().getRefCount());
sm.release(is);

deleteData(writer, true);
deleteData(writer, false);

//is.getIndexReader().close();
}
writer.close();
} catch (IOException e) {
}

deleteData() is like so. Its goal is to delete half of the index with each
call. So after 2 calls all the docs are deleted.

MultiFieldQueryParser parser = new MultiFieldQueryParser(fields, analyzer);
Query q = parser.parse(queries_str);
if (q != null) {
writer.deleteDocuments(q);
writer.deleteUnusedFiles();
writer.forceMergeDeletes();
writer.commit();
System.out.println(" Attempted delete and committed...");
} else {
System.out.println(" Delete requested but data incorrect...");
}

My goal is to test disk reclaim (file deletes). I notice that when when
is.getIndexReader().close() is commented out (see in the main madness loop)
the files remain but when reader close is called, the files get deleted. Is
that expected? I thought searchermanager acquire/release should free the
files up (meaning indexreader will not hold on to file handles).

Now as far as the streaminfo is concerned, what I see (pasted below)
between the madness loop (when indexreader is closed) which I dont see when
the indexreader is not closed (when the files remain). Also at the end of
the madness loop, with reader.close() command alive, I am left with only
couple of files (the behavior I want).

IFD 0 [2017-11-21T03:47:08.983Z; main]:   DecRef "_1.cfs": pre-decr count
is 1
IFD 0 [2017-11-21T03:47:08.983Z; main]:   DecRef "_0.cfe": pre-decr count
is 1
IFD 0 [2017-11-21T03:47:08.984Z; main]:   DecRef "_0.si": pre-decr count is
1
IFD 0 [2017-11-21T03:47:08.984Z; main]:   DecRef "_1.cfe": pre-decr count
is 1
IFD 0 [2017-11-21T03:47:08.984Z; main]:   DecRef "_1.si": pre-decr count is
1
IFD 0 [2017-11-21T03:47:08.984Z; main]:   DecRef "_0.cfs": pre-decr count
is 1
IFD 0 [2017-11-21T03:47:08.984Z; main]: delete [_1.cfs, _0.cfe, _0.si,
_1.cfe, _1.si, _0.cfs]
IW 0 [2017-11-21T03:47:08.987Z; main]: decRefDeleter for NRT reader
version=5 segments=_0(6.6.2):c95218 _1(6.6.2):c4781

Also does my program structure look ok? Specifically

a) Do I need to create the SearcherManager every time like I have done or
can I pull it out of the madness loop?
b) Should I be calling the deleteUnusedFiles() and forceMergeDeletes() or
are those redundant calls?

Thanks for any insight.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message