lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Rutherglen (JIRA)" <>
Subject [jira] Updated: (LUCENE-1313) Ocean Realtime Search
Date Tue, 24 Jun 2008 21:32:45 GMT


Jason Rutherglen updated LUCENE-1313:

    Attachment: lucene-1313.patch


Depends on LUCENE-1312 and LUCENE-1314.  More bugs fixed.  Deletes are committed to indexes
only intermittently which improves the update speed dramatically.   MaybeMergeIndexes now
runs via a background timer. 

Will remove writing a snapshot.xml file per transaction in favor of a human readable log.
 Creating and deleting these small files is a bottleneck for update speed.  This way a transaction
writes to 2 files only.  The merges happen in the background and so never affect the transaction
update speed.  I am not sure how useful it would be, but it is possible to have a priority
based IO system that favors transactions over merges.  If a transaction is coming in and a
merge is happening to disk, the merge is stopped and the transaction IO runs, then the merge
IO continues.  

I am not sure how to handle Documents with Fields that have a TokenStream as the value as
I believe these cannot be serialized.  For now I assume it will be unsupported.  

Also not sure how to handle analyzers, are these generally serializable?  It would be useful
to serialize them for a more automated log recovery process.

> Ocean Realtime Search
> ---------------------
>                 Key: LUCENE-1313
>                 URL:
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: contrib/*
>            Reporter: Jason Rutherglen
>         Attachments: lucene-1313.patch, lucene-1313.patch, lucene-1313.patch
> Provides realtime search using Lucene.  Conceptually, updates are divided into discrete
transactions.  The transaction is recorded to a transaction log which is similar to the mysql
bin log.  Deletes from the transaction are made to the existing indexes.  Document additions
are made to an in memory InstantiatedIndex.  The transaction is then complete.  After each
transaction TransactionSystem.getSearcher() may be called which allows searching over the
index including the latest transaction.
> TransactionSystem is the main class.  Methods similar to IndexWriter are provided for
updating.  getSearcher returns a Searcher class. 
> - getSearcher()
> - addDocument(Document document)
> - addDocument(Document document, Analyzer analyzer)
> - updateDocument(Term term, Document document)
> - updateDocument(Term term, Document document, Analyzer analyzer)
> - deleteDocument(Term term)
> - deleteDocument(Query query)
> - commitTransaction(List<Document> documents, Analyzer analyzer, List<Term>
deleteByTerms, List<Query> deleteByQueries)
> Sample code:
> {code}
> // setup
> FSDirectoryMap directoryMap = new FSDirectoryMap(new File("/testocean"), "log");
> LogDirectory logDirectory = directoryMap.getLogDirectory();
> TransactionLog transactionLog = new TransactionLog(logDirectory);
> TransactionSystem system = new TransactionSystem(transactionLog, new SimpleAnalyzer(),
> // transaction
> Document d = new Document();
> d.add(new Field("contents", "hello world", Field.Store.YES, Field.Index.TOKENIZED));
> system.addDocument(d);
> // search
> OceanSearcher searcher = system.getSearcher();
> ScoreDoc[] hits =, null, 1000).scoreDocs;
> System.out.println(hits.length + " total results");
> for (int i = 0; i < hits.length && i < 10; i++) {
>   Document d = searcher.doc(hits[i].doc);
>   System.out.println(i + " " + hits[i].score+ " " + d.get("contents");
> }
> {code}
> There is a test class org.apache.lucene.ocean.TestSearch that was used for basic testing.
> A sample disk directory structure is as follows:
> |/snapshot_105_00.xml | XML file containing which indexes and their generation numbers
correspond to a snapshot.  Each transaction creates a new snapshot file.  In this file the
105 is the snapshotid, also known as the transactionid.  The 00 is the minor version of the
snapshot corresponding to a merge.  A merge is a minor snapshot version because the data does
not change, only the underlying structure of the index|
> |/3 | Directory containing an on disk Lucene index|
> |/log | Directory containing log files|
> |/log/log00000001.bin | Log file.  As new log files are created the suffix number is

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message