Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 38409 invoked from network); 28 Sep 2010 12:51:24 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 28 Sep 2010 12:51:24 -0000 Received: (qmail 1847 invoked by uid 500); 28 Sep 2010 12:51:22 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 1420 invoked by uid 500); 28 Sep 2010 12:51:18 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 1375 invoked by uid 99); 28 Sep 2010 12:51:17 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 28 Sep 2010 12:51:17 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of erickerickson@gmail.com designates 209.85.216.176 as permitted sender) Received: from [209.85.216.176] (HELO mail-qy0-f176.google.com) (209.85.216.176) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 28 Sep 2010 12:51:12 +0000 Received: by qyk36 with SMTP id 36so2829272qyk.14 for ; Tue, 28 Sep 2010 05:50:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:content-type; bh=gGVqQtk3C78eyNo/duOHndHDMA/Zae9p49myCiqIegw=; b=d15wWSgQBfOPsa1m4CmdMG4HVMqoWmxtog6yjbfKduYpQuMWuhtJko3dGU+hYL5iYv CMUTcQ4lqzZhbr5S1abLbLtynrWwkjRTwCDBaAnoWIoHn6zewNQnJ2cSi5kSOnqH6Klv 6IhqAPOLv5Y7/u1UfmPn+5oqRp3EUG3Bi/pEg= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=Qwz3kzUP1hF4zZdckRMPIcS3qZAlWre775eCu53VzlqhWaw7wbpudOjKGlqvq7JPj5 JDmAz9lOqqCgbqWnXa9rmetDXipvV8dPkV2zImnaXhqKkiayD8Gq28AqZiWUVs+wZang tbeCOPPCtgAOWU3SwJgqvki6D5rqGrE0vExcs= MIME-Version: 1.0 Received: by 10.224.47.4 with SMTP id l4mr6832927qaf.157.1285678251870; Tue, 28 Sep 2010 05:50:51 -0700 (PDT) Received: by 10.229.232.19 with HTTP; Tue, 28 Sep 2010 05:50:51 -0700 (PDT) In-Reply-To: References: <001901cb5d34$92cef3b0$b86cdb10$@thetaphi.de> <004301cb5d9e$91a5cfd0$b4f16f70$@thetaphi.de> Date: Tue, 28 Sep 2010 08:50:51 -0400 Message-ID: Subject: Re: flushing index From: Erick Erickson To: java-user@lucene.apache.org Content-Type: multipart/alternative; boundary=00151750ea807f8b220491514ba1 --00151750ea807f8b220491514ba1 Content-Type: text/plain; charset=ISO-8859-1 Flushing an index to disk is just an IndexWriter.commit(), there's nothing really special about that... About running your code continuously, you have several options: 1> schedule a recurring job to do this. On *nix systems, this is a cron job, on Windows systems there's a job scheduler. 2> Just start it up in an infinite loop. That is, your main is just a while(1){}. you'll probably want to throttle it a bit, that is run, sleep for some interval and start again. 3> You can get really fancy and try to put some filesystem hooks in that notify you when anything changes in a directory, but I really wouldn't go there. Note that you'll have to keep some kind of timestamp (probably in a separate file or configuration somewhere) that you can compare against to figure out whether you've already indexed the current version of the file. The other thing you'll have to worry about is deletions. That is, how do you *remove* a file from your index if it has been deleted on disk? You may have to ask your index for all the file paths. You want to think about storing the file path NOT analyzed (perhaps with keywordtokenizer). That way you'll be able to know which files to remove if they are no longer in your directory. As well as which files to update when they've changed. HTH Erick On Tue, Sep 28, 2010 at 2:18 AM, Yakob wrote: > On 9/27/10, Uwe Schindler wrote: > > > > > > Yes. You must close before, else the addIndexes call will do nothing, as > the > > index looks empty for the addIndexes() call (because no committed > segments > > are available in the ramDir). > > > > I don't understand what you mean with flushing? If you are working on > Lucene > > 2.9 or 3.0, the ramWriter is flushed to the RAMDir on close. The > addIndexes > > call will add the index to the on-disk writer. To flush that fsWriter > (flush > > is the wrong thing, you probably mean commit), simply call > fsWriter.commit() > > so the newly added segments are written to disk and IndexReaders opened > in > > parallel "see" the new segments. > > > > Btw: If you are working on Lucene 3.0, the addIndexes call does not need > the > > new Directory[] {}, as the method is Java 5 varargs now. > > > > Uwe > > > > > > I mean I need to flush the index periodically.that's mean that the > index will be regularly updated as the document being added.what do > you reckon is the solution for this? I need a sample source code to be > able to flush an index. > > ok just like this source code below. > > public class SimpleFileIndexer { > > public static void main(String[] args) throws Exception { > > File indexDir = new > File("C:/Users/Raden/Documents/lucene/LuceneHibernate/adi"); > File dataDir = new > File("C:/Users/Raden/Documents/lucene/LuceneHibernate/adi"); > String suffix = "txt"; > > SimpleFileIndexer indexer = new SimpleFileIndexer(); > > int numIndex = indexer.index(indexDir, dataDir, suffix); > > System.out.println("Total files indexed " + numIndex); > > } > > private int index(File indexDir, File dataDir, String suffix) throws > Exception { > > IndexWriter indexWriter = new IndexWriter( > FSDirectory.open(indexDir), > new SimpleAnalyzer(), > true, > IndexWriter.MaxFieldLength.LIMITED); > indexWriter.setUseCompoundFile(false); > > indexDirectory(indexWriter, dataDir, suffix); > > int numIndexed = indexWriter.maxDoc(); > indexWriter.optimize(); > indexWriter.close(); > > return numIndexed; > > } > > private void indexDirectory(IndexWriter indexWriter, File dataDir, > String suffix) throws IOException { > File[] files = dataDir.listFiles(); > for (int i = 0; i < files.length; i++) { > File f = files[i]; > if (f.isDirectory()) { > indexDirectory(indexWriter, f, suffix); > } > else { > indexFileWithIndexWriter(indexWriter, f, > suffix); > } > } > } > > private void indexFileWithIndexWriter(IndexWriter indexWriter, File > f, String suffix) throws IOException { > if (f.isHidden() || f.isDirectory() || !f.canRead() || > !f.exists()) { > return; > } > if (suffix!=null && !f.getName().endsWith(suffix)) { > return; > } > System.out.println("Indexing file " + f.getCanonicalPath()); > > Document doc = new Document(); > doc.add(new Field("contents", new FileReader(f))); > doc.add(new Field("filename", f.getCanonicalPath(), > Field.Store.YES, > Field.Index.ANALYZED)); > > indexWriter.addDocument(doc); > } > > } > > > the above source code can index documents when given the directory of > text files. now what I am asking is how can I made the code to run > continuously? what class should I use? so that everytime there is new > documents added to that directory then lucene will index those > documents automatically, can you help me out on this one. I really > need to know what is the best solution. > > thanks > -- > http://jacobian.web.id > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > --00151750ea807f8b220491514ba1--