lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Corekin <jason.core...@gmail.com>
Subject Re: deleteDocuments(Term... terms) takes a long time to do nothing.
Date Sat, 14 Dec 2013 16:38:11 GMT
I knew that I had forgotten something.  Below is the line that I use to
create the field that I am trying to use to delete the entries with.  I
hope this avoids some confusion.  Thank you very much to anyone that takes
the time to read these messages.

doc.add(new StringField("FileName",filename, Field.Store.YES));


On Sat, Dec 14, 2013 at 1:28 AM, Jason Corekin <jason.corekin@gmail.com>wrote:

> Let me start by stating that I almost certain that I am doing something
> wrong, and that I hope that I am because if not there is a VERY large bug
> in Lucene.   What I am trying to do is use the method
>
>
> deleteDocuments(Term... terms)
>
>
>  out of the IndexWriter class to delete several Term object Arrays, each
> fed to it via a separate Thread.  Each array has around 460k+ Term object
> in it.  The issue is that after running for around 30 minutes or more the
> method finishes, I then have a commit run and nothing changes with my files.
> To be fair, I am running a custom Directory implementation that might be
> causing problems, but I do not think that this is the case as I do not even
> see any of the my Directory methods in the stack trace.  In fact when I
> set break points inside the delete methods of my Directory implementation
> they never even get hit. To be clear replacing the custom Directory
> implementation with a standard one is not an option due to the nature of
> the data which is made up of terabytes of small (1k and less) files.  So,
> if the issue is in the Directory implementation I have to figure out how to
> fix it.
>
>
> Below are the pieces of code that I think are relevant to this issue as
> well as a copy of the stack trace thread that was doing work when I paused
> the debug session.  As you are likely to notice, the thread is called a
> DBCloner because it is being used to clone the underlying Index based
> database (needed to avoid storing trillions of files directly on disk).  The
> idea is to duplicate the selected group of terms into a new database and
> then delete to original terms from the original database.  The duplicate
> work wonderfully, but not matter what I do including cutting the program
> down to one thread I cannot shrink the database and the time to try to do
> the deletes takes drastically too long.
>
>
> In an attempt to be as helpful as possible, I will say this.  I have been
> tracing this problem for a few days and have seen that
>
> BlockTreeTermsReader$FieldReader$SegmentTermsEnum.seekExact(BytesRef)
>
> is where that majority of the execution time is spent.  I have also
> noticed that this method return false MUCH more often than it returns true.
> I have been trying to figure out how the mechanics of this process work
> just in case the issue was not in my code and I might have been able  to
> find the problem.  But I have yet to find the problem either in Lucene
> 4.5.1 or Lucene 4.6.  If anyone has any ideas as to what I might be doing
> wrong, I would really appreciate reading what you have to say.  Thanks in
> advance.
>
>
>
> Jason
>
>
>
>                 private void cloneDB() throws QueryNodeException {
>
>
>
>                                 Document doc;
>
>                                 ArrayList<String> fileNames;
>
>                                 int start = docRanges[(threadNumber * 2)];
>
>                                 int stop = docRanges[(threadNumber * 2) +
> 1];
>
>
>
>                                 try {
>
>
>
>                                                 fileNames = new
> ArrayList<String>(docsPerThread);
>
>                                                 for (int i = start; i <
> stop; i++) {
>
>                                                                 doc =
> searcher.doc(i);
>
>                                                                 try {
>
>
> adder.addDoc(doc);
>
>
> fileNames.add(doc.get("FileName"));
>
>                                                                 } catch
> (TransactionExceptionRE | TransactionException | LockConflictException te) {
>
>
> adder.txnAbort();
>
>
> System.err.println(Thread.currentThread().getName() + ": Adding a message
> failed, retrying.");
>
>                                                                 }
>
>                                                 }
>
>                                                 deleters[threadNumber].deleteTerms("FileName",
> fileNames);
>
>
> deleters[threadNumber].commit();
>
>
>
>                                 } catch (IOException | ParseException ex)
> {
>
>                                                 Logger.getLogger(DocReader.class.getName()).log(Level.SEVERE,
> null, ex);
>
>                                 }
>
>                 }
>
>
>
>
>
>                                 public void deleteTerms(String
> dbField,ArrayList<String> fieldTexts) throws IOException {
>
>                                 Term[] terms = new
> Term[fieldTexts.size()];
>
>                                 for(int i=0;i<fieldTexts.size();i++){
>
>                                                 terms[i]= new
> Term(dbField,fieldTexts.get(i));
>
>                                 }
>
>                                 writer.deleteDocuments(terms);
>
>                 }
>
>
>
>                 public void deleteDocuments(Term... terms) throws
> IOException
>
>
>
>
>
>                 Thread [DB Cloner 2] (Suspended)
>
>                 owns: BufferedUpdatesStream  (id=54)
>
>                 owns: IndexWriter  (id=49)
>
>                 FST<T>.readFirstRealTargetArc(long, Arc<T>, BytesReader)
> line: 979
>
>                 FST<T>.findTargetArc(int, Arc<T>, Arc<T>, BytesReader)
> line: 1220
>
>                 BlockTreeTermsReader$FieldReader$SegmentTermsEnum.seekExact(BytesRef)
> line: 1679
>
>                 BufferedUpdatesStream.applyTermDeletes(Iterable<Term>,
> ReadersAndUpdates, SegmentReader) line: 414
>
>                 BufferedUpdatesStream.applyDeletesAndUpdates(ReaderPool,
> List<SegmentCommitInfo>) line: 283
>
>                 IndexWriter.applyAllDeletesAndUpdates() line: 3112
>
>                 IndexWriter.applyDeletesAndPurge(boolean) line: 4641
>
>
>                 DocumentsWriter$ApplyDeletesEvent.process(IndexWriter,
> boolean, boolean) line: 673
>
>                 IndexWriter.processEvents(Queue<Event>, boolean, boolean)
> line: 4665
>
>                 IndexWriter.processEvents(boolean, boolean) line: 4657
>
>
>                 IndexWriter.deleteDocuments(Term...) line: 1421
>
>                 DocDeleter.deleteTerms(String, ArrayList<String>) line: 95
>
>
>                 DBCloner.cloneDB() line: 233
>
>                 DBCloner.run() line: 133
>
>                 Thread.run() line: 744
>
>
>
>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message