lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "J. Delgado" <jdelg...@lendingclub.com>
Subject Re: [jira] Commented: (LUCENE-724) Oracle JVM implementation for Lucene DataStore also a preliminary implementation for an Oracle Domain index using Lucene
Date Wed, 08 Aug 2007 17:54:02 GMT
Michael, are you still working on this replacement of the BLOB I/O?

I'm looking into parameterizing the option of lazy syncs of DML
operations (via calls to LuceneDomainIndex.sync potentially queued
using dbms_aq) which is convenient for bulk inserts vs. real-time
syncs for non-bulked operations for transactional data retrieval.

-- Joaquin

2007/7/12, Michael Goddard (JIRA) <jira@apache.org>:
>
>     [ https://issues.apache.org/jira/browse/LUCENE-724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12512169
]
>
> Michael Goddard commented on LUCENE-724:
> ----------------------------------------
>
> Marcelo,
>
> Are you still working on this?  I have been experimenting with it recently -- thank you
for creating it.  Do you think that the I/O might be faster if the Vector was replaced with
BLOB I/O via InputStream, OutputStream directly?  That is what I am working with right now,
and I did observe my indexing time for a sample data set go from 22 seconds to 13 seconds.
 I do currently have the problem that the resulting index is not behaving correctly and am
working on that.
>
>
> > Oracle JVM implementation for Lucene DataStore also a preliminary implementation
for an Oracle Domain index using Lucene
> > ------------------------------------------------------------------------------------------------------------------------
> >
> >                 Key: LUCENE-724
> >                 URL: https://issues.apache.org/jira/browse/LUCENE-724
> >             Project: Lucene - Java
> >          Issue Type: New Feature
> >          Components: Store
> >    Affects Versions: 2.0.0
> >         Environment: Oracle 10g R2 with latest patchset, there is a txt file into
the lib directory with the required libraries to compile this extension, which for legal issues
I can't redistribute. All these libraries are include into the Oracle home directory,
> >            Reporter: Marcelo F. Ochoa
> >            Priority: Minor
> >         Attachments: ojvm-01-09-07.tar.gz, ojvm-11-28-06.tar.gz, ojvm-12-20-06.tar.gz,
ojvm.tar.gz
> >
> >
> > Here a preliminary implementation of the Oracle JVM Directory data store which replace
a file system by BLOB data storage.
> > The reason to do this is:
> >   - Using traditional File System for storing the inverted index is not a good option
for some users.
> >   - Using BLOB for storing the inverted index running Lucene outside the Oracle
database has a bad performance because there are a lot of network round trips and data marshalling.
> >   - Indexing relational data stores such as tables with VARCHAR2, CLOB or XMLType
with Lucene running outside the database has the same problem as the previous point.
> >   - The JVM included inside the Oracle database can scale up to 10.000+ concurrent
threads without memory leaks or deadlock and all the operation on tables are in the same memory
space!!
> >   With these points in mind, I uploaded the complete Lucene framework inside the
Oracle JVM and I runned the complete JUnit test case successful, except for some test such
as the RMI test which requires special grants to open ports inside the database.
> >   The Lucene's test cases run faster inside the Oracle database (11g) than the Sun
JDK 1.5, because the classes are automatically JITed after some executions.
> >   I had implemented and OJVMDirectory Lucene Store which replaces the file system
storage with a BLOB based storage, compared with a RAMDirectory implementation is a bit slower
but we gets all the benefits of the BLOB storage (backup, concurrence control, and so on).
> >  The OJVMDirectory is cloned from the source at
> > http://issues.apache.org/jira/browse/LUCENE-150 (DBDirectory) but with some changes
to run faster inside the Oracle JVM.
> >  At this moment, I am working in a full integration with the SQL Engine using the
Data Cartridge API, it means using Lucene as a new Oracle Domain Index.
> >  With this extension we can create a Lucene Inverted index in a table using:
> > create index it1 on t1(f2) indextype is LuceneIndex parameters('test');
> >  assuming that the table t1 has a column f2 of type VARCHAR2, CLOB or XMLType, after
this, the query against the Lucene inverted index can be made using a new Oracle operator:
> > select * from t1 where contains(f2, 'Marcelo') = 1;
> >  the important point here is that this query is integrated with the execution plan
of the Oracle database, so in this simple example the Oracle optimizer see that the column
"f2" is indexed with the Lucene Domain index, then using the Data Cartridge API a Java code
running inside the Oracle JVM is executed to open the search, a fetch all the ROWID that match
with "Marcelo" and get the rows using the pointer,
> > here the output:
> > SELECT STATEMENT                                      ALL_ROWS      3       1  
    115
> >        TABLE ACCESS(BY INDEX ROWID) LUCENE.T1          3       1       115
> >             DOMAIN INDEX LUCENE.IT1
> >  Another benefits of using the Data Cartridge API is that if the table T1 has insert,
update or delete rows operations a corresponding Java method will be called to automatically
update the Lucene Index.
> >   There is a simple HTML file with some explanation of the code.
> >    The install.sql script is not fully tested and must be lunched into the Oracle
database, not remotely.
> >   Best regards, Marcelo.
> > - For Oracle users the big question is, Why do I use Lucene instead of Oracle Text
which is implemented in C?
> >   I think that the answer is too simple, Lucene is open source and anybody can extend
it and add the functionality needed
> > - For Lucene users which try to use Lucene as enterprise search engine, the Oracle
JVM provides an highly scalable container which can scale up to 10.000+ concurrent session
and with the facility of querying table in the same memory space.
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message