Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@www.apache.org Received: (qmail 95185 invoked from network); 19 Nov 2004 17:53:10 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur-2.apache.org with SMTP; 19 Nov 2004 17:53:10 -0000 Received: (qmail 19176 invoked by uid 500); 19 Nov 2004 17:53:00 -0000 Delivered-To: apmail-jakarta-lucene-user-archive@jakarta.apache.org Received: (qmail 19150 invoked by uid 500); 19 Nov 2004 17:53:00 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 19135 invoked by uid 99); 19 Nov 2004 17:53:00 -0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=RCVD_BY_IP,SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (hermes.apache.org: domain of greenlion@gmail.com designates 64.233.170.197 as permitted sender) Received: from [64.233.170.197] (HELO rproxy.gmail.com) (64.233.170.197) by apache.org (qpsmtpd/0.28) with ESMTP; Fri, 19 Nov 2004 09:52:58 -0800 Received: by rproxy.gmail.com with SMTP id 34so107635rns for ; Fri, 19 Nov 2004 09:52:54 -0800 (PST) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:reply-to:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:references; b=tx+7AMboWuhiq2lVp4bljtywfp2xPBDPpTHyVH9w6bZLqGcoNxzxZR6LRNysLflfs/rl7MQFP07e+P08zYbDNQDqsm9ih/gORVgUaeFQADs8B8/LdUsXVXzeLRIFNf5BHSzFTIFbn7zgpFjGLyvIQNO00Izo41PlO8Ff+q9Ehh0= Received: by 10.38.10.42 with SMTP id 42mr6314rnj; Fri, 19 Nov 2004 09:52:54 -0800 (PST) Received: by 10.38.179.9 with HTTP; Fri, 19 Nov 2004 09:52:54 -0800 (PST) Message-ID: Date: Fri, 19 Nov 2004 10:52:54 -0700 From: Justin Swanhart Reply-To: Justin Swanhart To: Lucene Users List Subject: Re: java.io.FileNotFoundException: ... (No such file or directory) In-Reply-To: <63434C14F9A6F74CB36B85033E4C30CA5BFCD9@hermes.corp.cyveillance.com> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit References: <63434C14F9A6F74CB36B85033E4C30CA5BFCD9@hermes.corp.cyveillance.com> X-Virus-Checked: Checked X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N Is it possible that while my searcher process is reading the directory that the index writer process performs a merge? If that is so, then the I think that the merge could remove segment files before they are read by the reader. When the reader tries to read one of the now missing segment files it throws the IOException. That file was in the segments file when the RAMDirectory started loading the directory, but now it is missing because of the merge. This would most likely not affect small indexes, but large indexes like mine, especially over a network file system could definitely be affected. If this is what is happening, a way around it would be to open all the files in the segment file when the segment file is read. Then valid file handles will be maintained for all the files that need to be read. If the index writer process removes a segment, then the file handle should still be valid. This might only work for local filesystems though, I'm not sure if NFS works that way or not. On Thu, 18 Nov 2004 19:16:46 -0500, Will Allen wrote: > I have gotten this a few times. I am also using a NFS mount, but have seen it in cases where a mount wasn't involved. > > I cannot speak to why this is happening, but I have posted to this forum before a way of repairing your index by modifying the segments file. Search for "wallen". > > The other thing I have done, is use code to copy the documents that can be read by a reader to a new index. I suppose I should submit those tools to open source! > > Anyway, this error will break the searcher, but the index can still be read with an indexreader. > > -Will > > Here is the source of a method that should get you started (logger is a log4j object): > > public void transferDocuments() > throws IOException > { > IndexReader reader = IndexReader.open(brokenDir); > logger.debug(reader.numDocs() + ""); > IndexWriter writer = new IndexWriter(newIndexDir, PopIndexer.popAnalyzer(),true); > writer.minMergeDocs = 50; > writer.mergeFactor = 200; > writer.setUseCompoundFile(true); > int docCount = reader.numDocs(); > Date start = new Date(); > //docCount = Math.min(docCount, 500); > for(int x=0; x < docCount; x++) > { > try > { > if(!reader.isDeleted(x)) > { > Document doc = reader.document(x); > if(x % 1000 == 0) > { > logger.debug(doc.get("subject")); > } > //remove the new fields if they exist, and add new value > //TODO test not having this in > /* > for ( Enumeration newFields = doc.fields(); newFields.hasMoreElements(); ) > { > Field newField = (Field) newFields.nextElement(); > doc.removeFields( newField.name() ); > doc.add( newField ); > } > */ > doc.removeFields("counter"); > doc.add(Field.Keyword("counter", "counter")); > // reinsert old document > writer.addDocument( doc ); > } > } > catch(IOException ioe) > { > logger.error("doc:" + x + " failed, " + ioe.getMessage()); > } > catch(IndexOutOfBoundsException ioobe) > { > logger.error("INDEX OUT OF BOUNDS!" + ioobe.getMessage()); > ioobe.printStackTrace(); > } > } > reader.close(); > //logger.debug("done, about the optimize"); > //writer.optimize(); > writer.close(); > long time = ((new Date()).getTime() - start.getTime())/1000; > logger.info("done optimizing: " + time + " seconds or " + (docCount / time) + " rec/sec"); > > > } > > -----Original Message----- > From: Justin Swanhart [mailto:greenlion@gmail.com] > Sent: Thursday, November 18, 2004 5:00 PM > To: Lucene Users List > Subject: java.io.FileNotFoundException: ... (No such file or directory) > > I have two index processes. One is an index server, the other is a > search server. The processes run on different machines. > > The index server is a single threaded process that reads from the > database and adds > unindexed rows to the index as needed. It sleeps for a couple minutes > between each > batch to allow newly added/updated rows to accumulate. > > The searcher process keeps an open cache of IndexSearcher objects and > is multithreaded. > It accepts connections on a tcp port, runs the query and stores the > results in a database. > After a set interval, the server checks to see if the index on disk is > a newer version. If it is, > it loads the index into a new IndexSearcher as a RAMDirectory. > > Every once in awhile, the index reader process gets a FileNotFoundException: > 20041118 1378 1383 (index number, old version, new version) > [newer version found] Loading index directory into RAM: 20041118 > java.io.FileNotFoundException: > /path/omitted/for/obvious/reasons/_4zj6.cfs (No such file or > directory) > at java.io.RandomAccessFile.open(Native Method) > at java.io.RandomAccessFile.(RandomAccessFile.java:204) > at org.en.lucene.store.FSInputStream$Descriptor.(FSDirectory.java:376) > at org.en.lucene.store.FSInputStream.(FSDirectory.java:405) > at org.en.lucene.store.FSDirectory.openFile(FSDirectory.java:268) > at org.en.lucene.store.RAMDirectory.(RAMDirectory.java:60) > at org.en.lucene.store.RAMDirectory.(RAMDirectory.java:89) > at org.en.global.searchserver.UpdateSearchers.createIndexSearchers(Search.java:89) > at org.en.global.searchserver.UpdateSearchers.run(Search.java:54) > > the code being called at that point is: > //add the directory to the HashMap of IndexSearchers (dir# => IndexSearcher) > indexSearchers.put(subDirs[i],new IndexSearcher(new > RAMDirectory(indexDir + "/" + subDirs[i]))); > > The indexes are located on a NFS mountpoint. Could this be the > problem? Or should I be looking elsewhere... Should i just check for > an IOException, and try reloading the index if I get an error? > > --------------------------------------------------------------------- > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org > For additional commands, e-mail: lucene-user-help@jakarta.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-user-help@jakarta.apache.org