Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm
Precedence: bulk
Reply-To: "Lucene Users List" <lucene-user@jakarta.apache.org>
Received-SPF: pass (hermes.apache.org: local policy)
Message-ID: <054a01c4c84a$5443ef90$7703d00a@hypermedia.com>
From: "Luke Shannon" <lshannon@hypermedia.com>
To: "Lucene Users List" <lucene-user@jakarta.apache.org>,
	<yahootintin-lucene@yahoo.com>
References: <20041111235654.83806.qmail@web41006.mail.yahoo.com>
Subject: Re: Lucene : avoiding locking
Date: Thu, 11 Nov 2004 18:58:23 -0500
MIME-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: 7bit

I will try that now.
Thank you.

----- Original Message ----- 
From: <yahootintin-lucene@yahoo.com>
To: "Lucene Users List" <lucene-user@jakarta.apache.org>
Sent: Thursday, November 11, 2004 6:56 PM
Subject: Re: Lucene : avoiding locking


> I'm working on a similar project...
> Make sure that only one call to the index method is occuring at
> a time.  Synchronizing that method should do it.
>
> --- Luke Shannon <lshannon@hypermedia.com> wrote:
>
> > Hi All;
> >
> > I have hit a snag in my Lucene integration and don't know what
> > to do.
> >
> >  My company has a content management product. Each time
> > someone changes the
> >  directory structure or a file with in it that portion of the
> > site needs to
> >  be re-indexed so the changes are reflected in future searches
> > (indexing
> > must
> >  happen during run time).
> >
> >  I have written a Indexer class with a static Index() method.
> > The idea is
> > too
> >  call the method every time something changes and the index
> > needs to be
> >  re-examined. I am hoping the logic put in by Doug Cutting
> > surrounding the
> >  UID will make indexing efficient enough to be called so
> > frequently.
> >
> >  This class works great when I tested it on my own little site
> > (I have about
> >  2000 file). But when I drop the functionality into the QA
> > environment I get
> >  a locking error.
> >
> >  I can't access the stack trace, all I can get at is a log
> > file the
> >  application writes too. Here is the section my class wrote.
> > It was right in
> >  the middle of indexing and bang lock issue.
> >
> >  I don't know if the problem is in my code or something in the
> > existing
> >  application.
> >
> >  Error Message:
> >  ENTER|SearchEventProcessor.visit(ContentNodeDeleteEvent)
> >  |INFO|INDEXING INFO: Start Indexing new content.
> >  |INFO|INDEXING INFO: Index Folder Did Not Exist. Start
> > Creation Of New
> > Index
> >  |INFO|INDEXING INFO: Beginnging Incremental update
> > comparisions
> >  |INFO|INDEXING INFO: Beginnging Incremental update
> > comparisions
> >  |INFO|INDEXING INFO: Beginnging Incremental update
> > comparisions
> >  |INFO|INDEXING INFO: Beginnging Incremental update
> > comparisions
> >  |INFO|INDEXING INFO: Beginnging Incremental update
> > comparisions
> >  |INFO|INDEXING INFO: Beginnging Incremental update
> > comparisions
> >  |INFO|INDEXING INFO: Beginnging Incremental update
> > comparisions
> >  |INFO|INDEXING INFO: Beginnging Incremental update
> > comparisions
> >  |INFO|INDEXING INFO: Beginnging Incremental update
> > comparisions
> >  |INFO|INDEXING INFO: Beginnging Incremental update
> > comparisions
> >  |INFO|INDEXING INFO: Beginnging Incremental update
> > comparisions
> >  |INFO|INDEXING INFO: Beginnging Incremental update
> > comparisions
> >  |INFO|INDEXING INFO: Beginnging Incremental update
> > comparisions
> >  |INFO|INDEXING INFO: Beginnging Incremental update
> > comparisions
> >  |INFO|INDEXING INFO: Beginnging Incremental update
> > comparisions
> >  |INFO|INDEXING INFO: Beginnging Incremental update
> > comparisions
> >  |INFO|INDEXING INFO: Beginnging Incremental update
> > comparisions
> >  |INFO|INDEXING ERROR: Unable to index new content Lock obtain
> > timed out:
> >
> >
>
Lock@/usr/tomcat/jakarta-tomcat-5.0.19/temp/lucene-398fbd170a5457d05e2f4d432
> >  10f7fe8-write.lock
> >
> > |ENTER|UpdateCacheEventProcessor.visit(ContentNodeDeleteEvent)
> >
> >  Here is my code. You will recognize it pretty much as the
> > IndexHTML class
> >  from the Lucene demo written by Doug Cutting. I have put a
> > ton of comments
> >  in a attempt to understand what is going on.
> >
> >  Any help would be appreciated.
> >
> >  Luke
> >
> >  package com.fbhm.bolt.search;
> >
> >  /*
> >   * Created on Nov 11, 2004
> >   *
> >   * This class will create a single index file for the Content
> >   * Management System (CMS). It contains logic to ensure
> >   * indexing is done "intelligently". Based on IndexHTML.java
> >   * from the demo folder that ships with Lucene
> >   */
> >
> >  import java.io.File;
> >  import java.io.IOException;
> >  import java.util.Arrays;
> >  import java.util.Date;
> >
> >  import org.apache.lucene.analysis.standard.StandardAnalyzer;
> >  import org.apache.lucene.document.Document;
> >  import org.apache.lucene.index.IndexReader;
> >  import org.apache.lucene.index.IndexWriter;
> >  import org.apache.lucene.index.Term;
> >  import org.apache.lucene.index.TermEnum;
> >  import org.pdfbox.searchengine.lucene.LucenePDFDocument;
> >  import org.apache.lucene.demo.HTMLDocument;
> >
> >  import com.alaia.common.debug.Trace;
> >  import com.alaia.common.util.AppProperties;
> >
> >  /**
> >   * @author lshannon Description: <br
> >   *   This class is used to index a content folder. It
> > contains logic to
> >   *   ensure only new or documents that have been modified
> > since the last
> >   *   search are indexed. <br
> >   *   Based on code writen by Doug Cutting in the IndexHTML
> > class found in
> >   *   the Lucene demo
> >   */
> >  public class Indexer {
> >   //true during deletion pass, this is when the index already
> > exists
> >   private static boolean deleting = false;
> >
> >   //object to read existing indexes
> >   private static IndexReader reader;
> >
> >   //object to write to the index folder
> >   private static IndexWriter writer;
> >
> >   //this will be used to write the index file
> >   private static TermEnum uidIter;
> >
> >   /*
> >    * This static method does all the work, the end result is
> > an up-to-date
> >  index folder
> >   */
> >   public static void Index() {
> >    //we will assume to start the index has been created
> >    boolean create = true;
> >    //set the name of the index file
> >    String indexFileLocation =
> >
> >
> AppProperties.getPropertyAsString("bolt.search.siteIndex.index.root");
> >    //set the name of the content folder
> >    String contentFolderLocation =
> >  AppProperties.getPropertyAsString("site.root");
> >    //manage whether the index needs to be created or not
> >    File index = new File(indexFileLocation);
> >    File root = new File(contentFolderLocation);
> >    //the index file indicated exists, we need an incremental
> > update of the
> >    // index
> >    if (index.exists()) {
> >     Trace.TRACE("INDEXING INFO: An index folder exists at: " +
> >  indexFileLocation);
> >     deleting = true;
> >     create = false;
> >     try {
> >      //this version of index docs is able to execute the
> > incremental
> >      // update
> >      indexDocs(root, indexFileLocation, create);
> >     } catch (Exception e) {
> >      //we were unable to do the incremental update
> >      Trace.TRACE("INDEXING ERROR: Unable to execute
> > incremental update "
> >          + e.getMessage());
> >     }
> >     //after exiting this loop the index should be current with
> > content
> >     Trace.TRACE("INDEXING INFO: Incremental update
> > completed.");
> >    }
> >    try {
> >     //create the writer
> >     writer = new IndexWriter(index, new StandardAnalyzer(),
> > create);
> >     //configure the writer
> >     writer.mergeFactor = 10000;
> >     writer.maxFieldLength = 100000;
> >     try {
> >      //get the start date
> >      Date start = new Date();
> >      //call the indexDocs method, this time we will add new
> >      // documents
> >      Trace.TRACE("INDEXING INFO: Start Indexing new
> > content.");
> >      indexDocs(root, indexFileLocation, create);
> >      Trace.TRACE("INDEXING INFO: Indexing new content
> > complete.");
> >      //optimize the index
> >      writer.optimize();
> >      //close the writer
> >      writer.close();
> >      //get the end date
> >      Date end = new Date();
> >      long totalTime = end.getTime() - start.getTime();
> >      Trace.TRACE("INDEXING INFO: All Indexing Operations
> > Completed in "
> >          + totalTime + " milliseconds");
> >     } catch (Exception e1) {
> >      //unable to add new documents
> >      Trace.TRACE("INDEXING ERROR: Unable to index new content
> > "
> >          + e1.getMessage());
> >     }
> >    } catch (IOException e) {
> >     Trace.TRACE("INDEXING ERROR: Unable to create IndexWriter
> > "
> >       + e.getMessage());
> >    }
> >   }
> >
> >   /*
> >    * Walk directory hierarchy in uid order, while keeping uid
> > iterator from
> >  /*
> >    * existing index in sync. Mismatches indicate one of: (a)
> > old documents
> > to
> >  /*
> >    * be deleted; (b) unchanged documents, to be left alone; or
> > (c) new /*
> >    * documents, to be indexed.
> >    */
> >
> >   private static void indexDocs(File file, String index,
> > boolean create)
> >     throws Exception {
> >    //the index already exists we do an incremental update
> >    if (!create) {
> >     Trace.TRACE("INDEXING INFO: Incremental Update Request
> > Confirmed");
> >     //open existing index
> >     reader = IndexReader.open(index);
> >     //this gets an enummeration of uid terms
> >     uidIter = reader.terms(new Term("uid", ""));
> >     //jump to the index method that does the work
> >     //this will use the Iteration above and does
> >     //all the "smart" indexing
> >     indexDocs(file);
> >     //this will be true everytime the index already existed
> >     //we are not going to delete documents that are old
> >     if (deleting) {
> >      Trace.TRACE("INDEXING INFO: Deleting Old Content Phase
> > Started. All
> >  Deleted Docs will be listed.");
> >      while (uidIter.term() != null
> >        && uidIter.term().field() == "uid") {
> >       //basically we are deleting all the document we have
> >       // indexed before
> >       Trace.TRACE("INDEXING INFO: Deleting document "
> >         + HTMLDocument.uid2url(uidIter.term().text()));
> >       //delete the term from the reader
> >       reader.delete(uidIter.term());
> >       //go to the nextfield
> >       uidIter.next();
> >      }
> >      Trace.TRACE("INDEXING INFO: Deleting Old Content Phase
> > Completed");
> >      //turn off the deleting flag
> >      deleting = false;
> >     }//close the deleting branch
> >     //close the enummeration
> >     uidIter.close(); // close uid iterator
> >     //close the reader
> >     reader.close(); // close existing index
> >
> >    }
> >    //we go here is the index already existed
> >    else {
> >     Trace.TRACE("INDEXING INFO: Index Folder Did Not Exist.
> > Start Creation
> > Of
> >  New Index");
> >     // don't have exisiting
> >     indexDocs(file);
> >    }
> >   }
> >
> >   private static void indexDocs(File file) throws Exception {
> >    //check if we are at the top of a directory
> >    if (file.isDirectory()) {
> >     //get a list of the files
> >     String[] files = file.list();
> >     //sort them
> >     Arrays.sort(files);
> >     //index each file in the directory recursively
> >     //we keep repeating this logic until we hit a
> >     //file
> >     for (int i = 0; i < files.length; i++)
> >      //pass in the parent directory and the current file
> >      //into the file constructor and index
> >      indexDocs(new File(file, files[i]));
> >
> >    }
> >    //we have an actual file, so we need to consider the
> >    //file extensions so the correct Document is created
> >    else if (file.getPath().endsWith(".html")
> >      || file.getPath().endsWith(".htm")
> >      || file.getPath().endsWith(".txt")
> >      || file.getPath().endsWith(".doc")
> >      || file.getPath().endsWith(".xml")
> >      || file.getPath().endsWith(".pdf")) {
> >
> >     //if this is reached it means we were in the midst
> >     //of an incremental update
> >     if (uidIter != null) {
> >      //get the uid for the document we are on
> >      String uid = HTMLDocument.uid(file);
> >      //now compare this document to the one we have in the
> >      //enummeration of terms.
> >      //if the term in the enummeration is less than the
> >      //term we are on it must be deleted (if we are indeed
> >      //doing an incrementatal update)
> >      Trace.TRACE("INDEXING INFO: Beginnging Incremental update
> >  comparisions");
> >      while (uidIter.term() != null
> >        && uidIter.term().field() == "uid"
> >        && uidIter.term().text().compareTo(uid) < 0) {
> >       //delete stale docs
> >       if (deleting) {
> >        reader.delete(uidIter.term());
> >       }
> >       uidIter.next();
> >      }
> >      //if the terms are equal there is no change with this
> > document
> >      //we keep it as is
> >      if (uidIter.term() != null && uidIter.term().field() ==
> > "uid"
> >        && uidIter.term().text().compareTo(uid) == 0) {
> >       uidIter.next();
> >      }
> >      //if we are not deleting and the document was not there
> >      //it means we didn't have this document on the last index
> >      //and we should add it
> >      else if (!deleting) {
> >       if (file.getPath().endsWith(".pdf")) {
> >        Document doc = LucenePDFDocument.getDocument(file);
> >        Trace.TRACE("INDEXING INFO: Adding new document to the
> > existing
> > index:
> >  "
> >            + doc.get("url"));
> >        writer.addDocument(doc);
> >       } else if (file.getPath().endsWith(".xml")) {
> >        Document doc = XMLDocument.Document(file);
> >        Trace.TRACE("INDEXING INFO: Adding new document to the
> > existing
> > index:
> >  "
> >            + doc.get("url"));
> >        writer.addDocument(doc);
> >       } else {
> >        Document doc = HTMLDocument.Document(file);
> >        Trace.TRACE("INDEXING INFO: Adding new document to the
> > existing
> > index:
> >  "
> >            + doc.get("url"));
> >        writer.addDocument(doc);
> >       }
> >      }
> >     }//end the if for an incremental update
> >     //we are creating a new index, add all document types
> >     else {
> >      if (file.getPath().endsWith(".pdf")) {
> >       Document doc = LucenePDFDocument.getDocument(file);
> >       Trace.TRACE("INDEXING INFO: Adding a new document to the
> > new index: "
> >           + doc.get("url"));
> >       writer.addDocument(doc);
> >      } else if (file.getPath().endsWith(".xml")) {
> >       Document doc = XMLDocument.Document(file);
> >       Trace.TRACE("INDEXING INFO: Adding a new document to the
> > new index: "
> >           + doc.get("url"));
> >       writer.addDocument(doc);
> >      } else {
> >       Document doc = HTMLDocument.Document(file);
> >       Trace.TRACE("INDEXING INFO: Adding a new document to the
> > new index: "
> >           + doc.get("url"));
> >       writer.addDocument(doc);
> >      }//close the else
> >     }//close the else for a new index
> >    }//close the else if to handle file types
> >   }//close the indexDocs method
> >
> >  }
> >
> >
> >  ----- Original Message ----- 
> >  From: "Craig McClanahan" <craigmcc@gmail.com
> >  To: "Jakarta Commons Users List"
> > <commons-user@jakarta.apache.org
> >  Sent: Thursday, November 11, 2004 6:13 PM
> >  Subject: Re: avoiding locking
> >
> >
> >   In order to get any useful help, it would be nice to know
> > what you are
> >   trying to do, and (most importantly) what commons component
> > is giving
> >   you the problem :-).  The traditional approach is to put a
> > prefix on
> >   your subject line -- for commons package "foo" it would be:
> >
> >     [foo] avoiding locking
> >
> >   It's also generally helpful to see the entire stack trace,
> > not just
> >   the exception message itself.
> >
> >   Craig
> >
> >
> >   On Thu, 11 Nov 2004 17:27:19 -0500, Luke Shannon
> >   <lshannon@hypermedia.com wrote:
> >    What can I do to avoid locking issues?
> >
> >    Unable to execute incremental update Lock obtain timed out:
> >
> >
>
Lock@/usr/tomcat/jakarta-tomcat-5.0.19/temp/lucene-398fbd170a5457d05e2f4d432
> >  10f7fe8-write.lock
> >
> >    Thanks,
> >
> >    Luke
> >
> >
> >
> >
> ---------------------------------------------------------------------
> >   To unsubscribe, e-mail:
> > commons-user-unsubscribe@jakarta.apache.org
> >   For additional commands, e-mail:
> > commons-user-help@jakarta.apache.org
> >
> >
> >
> >
> >
> >
> >
> ---------------------------------------------------------------------
> >  To unsubscribe, e-mail:
> > commons-user-unsubscribe@jakarta.apache.org
> >  For additional commands, e-mail:
> > commons-user-help@jakarta.apache.org
> >
> >
> >
> >
> >
> >
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail:
> > lucene-user-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail:
> > lucene-user-help@jakarta.apache.org
> >
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org