Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@www.apache.org Received: (qmail 10176 invoked from network); 11 Nov 2004 23:57:45 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur-2.apache.org with SMTP; 11 Nov 2004 23:57:45 -0000 Received: (qmail 73372 invoked by uid 500); 11 Nov 2004 23:57:41 -0000 Delivered-To: apmail-jakarta-lucene-user-archive@jakarta.apache.org Received: (qmail 73353 invoked by uid 500); 11 Nov 2004 23:57:41 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 73333 invoked by uid 99); 11 Nov 2004 23:57:41 -0000 Received-SPF: pass (hermes.apache.org: local policy) Received: from [216.220.52.87] (HELO mail.hypermedia.com) (216.220.52.87) by apache.org (qpsmtpd/0.28) with ESMTP; Thu, 11 Nov 2004 15:57:41 -0800 Received: from p001002 ([204.225.84.27]) by mail.hypermedia.com with Microsoft SMTPSVC(5.0.2195.5329); Thu, 11 Nov 2004 18:58:23 -0500 Message-ID: <054a01c4c84a$5443ef90$7703d00a@hypermedia.com> From: "Luke Shannon" To: "Lucene Users List" , References: <20041111235654.83806.qmail@web41006.mail.yahoo.com> Subject: Re: Lucene : avoiding locking Date: Thu, 11 Nov 2004 18:58:23 -0500 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2800.1437 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1441 X-OriginalArrivalTime: 11 Nov 2004 23:58:23.0947 (UTC) FILETIME=[543A79B0:01C4C84A] X-Virus-Checked: Checked X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N I will try that now. Thank you. ----- Original Message ----- From: To: "Lucene Users List" Sent: Thursday, November 11, 2004 6:56 PM Subject: Re: Lucene : avoiding locking > I'm working on a similar project... > Make sure that only one call to the index method is occuring at > a time. Synchronizing that method should do it. > > --- Luke Shannon wrote: > > > Hi All; > > > > I have hit a snag in my Lucene integration and don't know what > > to do. > > > > My company has a content management product. Each time > > someone changes the > > directory structure or a file with in it that portion of the > > site needs to > > be re-indexed so the changes are reflected in future searches > > (indexing > > must > > happen during run time). > > > > I have written a Indexer class with a static Index() method. > > The idea is > > too > > call the method every time something changes and the index > > needs to be > > re-examined. I am hoping the logic put in by Doug Cutting > > surrounding the > > UID will make indexing efficient enough to be called so > > frequently. > > > > This class works great when I tested it on my own little site > > (I have about > > 2000 file). But when I drop the functionality into the QA > > environment I get > > a locking error. > > > > I can't access the stack trace, all I can get at is a log > > file the > > application writes too. Here is the section my class wrote. > > It was right in > > the middle of indexing and bang lock issue. > > > > I don't know if the problem is in my code or something in the > > existing > > application. > > > > Error Message: > > ENTER|SearchEventProcessor.visit(ContentNodeDeleteEvent) > > |INFO|INDEXING INFO: Start Indexing new content. > > |INFO|INDEXING INFO: Index Folder Did Not Exist. Start > > Creation Of New > > Index > > |INFO|INDEXING INFO: Beginnging Incremental update > > comparisions > > |INFO|INDEXING INFO: Beginnging Incremental update > > comparisions > > |INFO|INDEXING INFO: Beginnging Incremental update > > comparisions > > |INFO|INDEXING INFO: Beginnging Incremental update > > comparisions > > |INFO|INDEXING INFO: Beginnging Incremental update > > comparisions > > |INFO|INDEXING INFO: Beginnging Incremental update > > comparisions > > |INFO|INDEXING INFO: Beginnging Incremental update > > comparisions > > |INFO|INDEXING INFO: Beginnging Incremental update > > comparisions > > |INFO|INDEXING INFO: Beginnging Incremental update > > comparisions > > |INFO|INDEXING INFO: Beginnging Incremental update > > comparisions > > |INFO|INDEXING INFO: Beginnging Incremental update > > comparisions > > |INFO|INDEXING INFO: Beginnging Incremental update > > comparisions > > |INFO|INDEXING INFO: Beginnging Incremental update > > comparisions > > |INFO|INDEXING INFO: Beginnging Incremental update > > comparisions > > |INFO|INDEXING INFO: Beginnging Incremental update > > comparisions > > |INFO|INDEXING INFO: Beginnging Incremental update > > comparisions > > |INFO|INDEXING INFO: Beginnging Incremental update > > comparisions > > |INFO|INDEXING ERROR: Unable to index new content Lock obtain > > timed out: > > > > > Lock@/usr/tomcat/jakarta-tomcat-5.0.19/temp/lucene-398fbd170a5457d05e2f4d432 > > 10f7fe8-write.lock > > > > |ENTER|UpdateCacheEventProcessor.visit(ContentNodeDeleteEvent) > > > > Here is my code. You will recognize it pretty much as the > > IndexHTML class > > from the Lucene demo written by Doug Cutting. I have put a > > ton of comments > > in a attempt to understand what is going on. > > > > Any help would be appreciated. > > > > Luke > > > > package com.fbhm.bolt.search; > > > > /* > > * Created on Nov 11, 2004 > > * > > * This class will create a single index file for the Content > > * Management System (CMS). It contains logic to ensure > > * indexing is done "intelligently". Based on IndexHTML.java > > * from the demo folder that ships with Lucene > > */ > > > > import java.io.File; > > import java.io.IOException; > > import java.util.Arrays; > > import java.util.Date; > > > > import org.apache.lucene.analysis.standard.StandardAnalyzer; > > import org.apache.lucene.document.Document; > > import org.apache.lucene.index.IndexReader; > > import org.apache.lucene.index.IndexWriter; > > import org.apache.lucene.index.Term; > > import org.apache.lucene.index.TermEnum; > > import org.pdfbox.searchengine.lucene.LucenePDFDocument; > > import org.apache.lucene.demo.HTMLDocument; > > > > import com.alaia.common.debug.Trace; > > import com.alaia.common.util.AppProperties; > > > > /** > > * @author lshannon Description:
> * This class is used to index a content folder. It > > contains logic to > > * ensure only new or documents that have been modified > > since the last > > * search are indexed.
> * Based on code writen by Doug Cutting in the IndexHTML > > class found in > > * the Lucene demo > > */ > > public class Indexer { > > //true during deletion pass, this is when the index already > > exists > > private static boolean deleting = false; > > > > //object to read existing indexes > > private static IndexReader reader; > > > > //object to write to the index folder > > private static IndexWriter writer; > > > > //this will be used to write the index file > > private static TermEnum uidIter; > > > > /* > > * This static method does all the work, the end result is > > an up-to-date > > index folder > > */ > > public static void Index() { > > //we will assume to start the index has been created > > boolean create = true; > > //set the name of the index file > > String indexFileLocation = > > > > > AppProperties.getPropertyAsString("bolt.search.siteIndex.index.root"); > > //set the name of the content folder > > String contentFolderLocation = > > AppProperties.getPropertyAsString("site.root"); > > //manage whether the index needs to be created or not > > File index = new File(indexFileLocation); > > File root = new File(contentFolderLocation); > > //the index file indicated exists, we need an incremental > > update of the > > // index > > if (index.exists()) { > > Trace.TRACE("INDEXING INFO: An index folder exists at: " + > > indexFileLocation); > > deleting = true; > > create = false; > > try { > > //this version of index docs is able to execute the > > incremental > > // update > > indexDocs(root, indexFileLocation, create); > > } catch (Exception e) { > > //we were unable to do the incremental update > > Trace.TRACE("INDEXING ERROR: Unable to execute > > incremental update " > > + e.getMessage()); > > } > > //after exiting this loop the index should be current with > > content > > Trace.TRACE("INDEXING INFO: Incremental update > > completed."); > > } > > try { > > //create the writer > > writer = new IndexWriter(index, new StandardAnalyzer(), > > create); > > //configure the writer > > writer.mergeFactor = 10000; > > writer.maxFieldLength = 100000; > > try { > > //get the start date > > Date start = new Date(); > > //call the indexDocs method, this time we will add new > > // documents > > Trace.TRACE("INDEXING INFO: Start Indexing new > > content."); > > indexDocs(root, indexFileLocation, create); > > Trace.TRACE("INDEXING INFO: Indexing new content > > complete."); > > //optimize the index > > writer.optimize(); > > //close the writer > > writer.close(); > > //get the end date > > Date end = new Date(); > > long totalTime = end.getTime() - start.getTime(); > > Trace.TRACE("INDEXING INFO: All Indexing Operations > > Completed in " > > + totalTime + " milliseconds"); > > } catch (Exception e1) { > > //unable to add new documents > > Trace.TRACE("INDEXING ERROR: Unable to index new content > > " > > + e1.getMessage()); > > } > > } catch (IOException e) { > > Trace.TRACE("INDEXING ERROR: Unable to create IndexWriter > > " > > + e.getMessage()); > > } > > } > > > > /* > > * Walk directory hierarchy in uid order, while keeping uid > > iterator from > > /* > > * existing index in sync. Mismatches indicate one of: (a) > > old documents > > to > > /* > > * be deleted; (b) unchanged documents, to be left alone; or > > (c) new /* > > * documents, to be indexed. > > */ > > > > private static void indexDocs(File file, String index, > > boolean create) > > throws Exception { > > //the index already exists we do an incremental update > > if (!create) { > > Trace.TRACE("INDEXING INFO: Incremental Update Request > > Confirmed"); > > //open existing index > > reader = IndexReader.open(index); > > //this gets an enummeration of uid terms > > uidIter = reader.terms(new Term("uid", "")); > > //jump to the index method that does the work > > //this will use the Iteration above and does > > //all the "smart" indexing > > indexDocs(file); > > //this will be true everytime the index already existed > > //we are not going to delete documents that are old > > if (deleting) { > > Trace.TRACE("INDEXING INFO: Deleting Old Content Phase > > Started. All > > Deleted Docs will be listed."); > > while (uidIter.term() != null > > && uidIter.term().field() == "uid") { > > //basically we are deleting all the document we have > > // indexed before > > Trace.TRACE("INDEXING INFO: Deleting document " > > + HTMLDocument.uid2url(uidIter.term().text())); > > //delete the term from the reader > > reader.delete(uidIter.term()); > > //go to the nextfield > > uidIter.next(); > > } > > Trace.TRACE("INDEXING INFO: Deleting Old Content Phase > > Completed"); > > //turn off the deleting flag > > deleting = false; > > }//close the deleting branch > > //close the enummeration > > uidIter.close(); // close uid iterator > > //close the reader > > reader.close(); // close existing index > > > > } > > //we go here is the index already existed > > else { > > Trace.TRACE("INDEXING INFO: Index Folder Did Not Exist. > > Start Creation > > Of > > New Index"); > > // don't have exisiting > > indexDocs(file); > > } > > } > > > > private static void indexDocs(File file) throws Exception { > > //check if we are at the top of a directory > > if (file.isDirectory()) { > > //get a list of the files > > String[] files = file.list(); > > //sort them > > Arrays.sort(files); > > //index each file in the directory recursively > > //we keep repeating this logic until we hit a > > //file > > for (int i = 0; i < files.length; i++) > > //pass in the parent directory and the current file > > //into the file constructor and index > > indexDocs(new File(file, files[i])); > > > > } > > //we have an actual file, so we need to consider the > > //file extensions so the correct Document is created > > else if (file.getPath().endsWith(".html") > > || file.getPath().endsWith(".htm") > > || file.getPath().endsWith(".txt") > > || file.getPath().endsWith(".doc") > > || file.getPath().endsWith(".xml") > > || file.getPath().endsWith(".pdf")) { > > > > //if this is reached it means we were in the midst > > //of an incremental update > > if (uidIter != null) { > > //get the uid for the document we are on > > String uid = HTMLDocument.uid(file); > > //now compare this document to the one we have in the > > //enummeration of terms. > > //if the term in the enummeration is less than the > > //term we are on it must be deleted (if we are indeed > > //doing an incrementatal update) > > Trace.TRACE("INDEXING INFO: Beginnging Incremental update > > comparisions"); > > while (uidIter.term() != null > > && uidIter.term().field() == "uid" > > && uidIter.term().text().compareTo(uid) < 0) { > > //delete stale docs > > if (deleting) { > > reader.delete(uidIter.term()); > > } > > uidIter.next(); > > } > > //if the terms are equal there is no change with this > > document > > //we keep it as is > > if (uidIter.term() != null && uidIter.term().field() == > > "uid" > > && uidIter.term().text().compareTo(uid) == 0) { > > uidIter.next(); > > } > > //if we are not deleting and the document was not there > > //it means we didn't have this document on the last index > > //and we should add it > > else if (!deleting) { > > if (file.getPath().endsWith(".pdf")) { > > Document doc = LucenePDFDocument.getDocument(file); > > Trace.TRACE("INDEXING INFO: Adding new document to the > > existing > > index: > > " > > + doc.get("url")); > > writer.addDocument(doc); > > } else if (file.getPath().endsWith(".xml")) { > > Document doc = XMLDocument.Document(file); > > Trace.TRACE("INDEXING INFO: Adding new document to the > > existing > > index: > > " > > + doc.get("url")); > > writer.addDocument(doc); > > } else { > > Document doc = HTMLDocument.Document(file); > > Trace.TRACE("INDEXING INFO: Adding new document to the > > existing > > index: > > " > > + doc.get("url")); > > writer.addDocument(doc); > > } > > } > > }//end the if for an incremental update > > //we are creating a new index, add all document types > > else { > > if (file.getPath().endsWith(".pdf")) { > > Document doc = LucenePDFDocument.getDocument(file); > > Trace.TRACE("INDEXING INFO: Adding a new document to the > > new index: " > > + doc.get("url")); > > writer.addDocument(doc); > > } else if (file.getPath().endsWith(".xml")) { > > Document doc = XMLDocument.Document(file); > > Trace.TRACE("INDEXING INFO: Adding a new document to the > > new index: " > > + doc.get("url")); > > writer.addDocument(doc); > > } else { > > Document doc = HTMLDocument.Document(file); > > Trace.TRACE("INDEXING INFO: Adding a new document to the > > new index: " > > + doc.get("url")); > > writer.addDocument(doc); > > }//close the else > > }//close the else for a new index > > }//close the else if to handle file types > > }//close the indexDocs method > > > > } > > > > > > ----- Original Message ----- > > From: "Craig McClanahan" > To: "Jakarta Commons Users List" > > > Sent: Thursday, November 11, 2004 6:13 PM > > Subject: Re: avoiding locking > > > > > > In order to get any useful help, it would be nice to know > > what you are > > trying to do, and (most importantly) what commons component > > is giving > > you the problem :-). The traditional approach is to put a > > prefix on > > your subject line -- for commons package "foo" it would be: > > > > [foo] avoiding locking > > > > It's also generally helpful to see the entire stack trace, > > not just > > the exception message itself. > > > > Craig > > > > > > On Thu, 11 Nov 2004 17:27:19 -0500, Luke Shannon > > > What can I do to avoid locking issues? > > > > Unable to execute incremental update Lock obtain timed out: > > > > > Lock@/usr/tomcat/jakarta-tomcat-5.0.19/temp/lucene-398fbd170a5457d05e2f4d432 > > 10f7fe8-write.lock > > > > Thanks, > > > > Luke > > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: > > commons-user-unsubscribe@jakarta.apache.org > > For additional commands, e-mail: > > commons-user-help@jakarta.apache.org > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: > > commons-user-unsubscribe@jakarta.apache.org > > For additional commands, e-mail: > > commons-user-help@jakarta.apache.org > > > > > > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: > > lucene-user-unsubscribe@jakarta.apache.org > > For additional commands, e-mail: > > lucene-user-help@jakarta.apache.org > > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org > For additional commands, e-mail: lucene-user-help@jakarta.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-user-help@jakarta.apache.org