db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Knut Anders Hatlen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DERBY-590) How to integrate Derby with Lucene API?
Date Thu, 05 Jun 2014 10:36:02 GMT

    [ https://issues.apache.org/jira/browse/DERBY-590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14018659#comment-14018659

Knut Anders Hatlen commented on DERBY-590:

I suppose you could simulate the functionality that way. You'd probably need a custom query
parser as well, in that case, in order to make the query language understand that "method:compute"
is a single token. In the default Lucene query parser, that would be interpreted as a search
for the token "compute" in the field "method".

By the way, when I said "multiple indexes" and "multiple analyzers" above, I think I meant
what in Lucene speak should have been "multiple fields". I think it's still called a single
index in Lucene speak, even if you index separately on multiple fields/keys.

Currently, when the luceneSupport tool creates an index, it makes every string value a Document
with a single field called "luceneTextField".

                String  textcolValue = rs.getString( keyCount + 1 );
                if ( textcolValue != null )
                    doc.add(new TextField( LuceneQueryVTI.TEXT_FIELD_NAME, textcolValue, Store.NO));
                addDocument( iw, doc );

The flexibility I was looking for, was the ability to have more fields than the single, hard-coded
one. For example, by having an extra argument to CREATEINDEX (and UPDATEINDEX) which is a
comma-separated list of field names (with a reasonable default when NULL), and make the above
code add each of the fields.

In my hypothetical Java code in a CLOB example, that would mean something like this for creating
the index:

CALL LUCENESUPPORT.CREATEINDEX('app', 'sourcefiles', 'sourcetext', 'MyAnalyzer.create', 'comment,method',

The custom analyzer would be something like this:

public class MyAnalyzer extends Analyzer {

    public static Analyzer create() {
        return new MyAnalyzer();

    protected TokenStreamComponents createComponents(String field, Reader r) {
        switch (field) {
            case "comment":
                return new TokenStreamComponents(createCommentTokenizer(r));
            case "method":
                return new TokenStreamComponents(createMethodTokenizer(r));
                throw new AssertionError("unknown field name: " + field);

    private static Tokenizer createCommentTokenizer(Reader r) {
        // TODO: Create a tokenizer that extracts tokens only from
        // code comments.
        // ....

    private static Tokenizer createMethodTokenizer(Reader r) {
        // TODO: Create a tokenizer that only returns method names.
        // ....


Might not add any functionality that you couldn't work around somehow with the current implementation.
But I think that the extra flexibility would allow the application to push more of the full-text
search logic down to Lucene, where it belongs. At least you'd avoid the need for a custom
query parser and creation of synthetic tokens.

> How to integrate Derby with Lucene API?
> ---------------------------------------
>                 Key: DERBY-590
>                 URL: https://issues.apache.org/jira/browse/DERBY-590
>             Project: Derby
>          Issue Type: Improvement
>          Components: Documentation, SQL
>            Reporter: Abhijeet Mahesh
>              Labels: derby_triage10_11
>         Attachments: LucenePlugin.html, LucenePlugin.html, LucenePlugin.html, derby-590-01-ag-publicAccessToLuceneRoutines.diff,
derby-590-01-ah-publicAccessToLuceneRoutines.diff, derby-590-01-am-publicAccessToLuceneRoutines.diff,
derby-590-02-aa-cleanupFindbugsErrors.diff, derby-590-03-aa-removeTestingDiagnostic.diff,
derby-590-04-aa-removeIDFromListIndexes.diff, derby-590-05-aa-accessDeclaredMembers.diff,
derby-590-06-aa-suppressAccessChecks.diff, derby-590-07-aa-accessClassInPackage.sun.misc.diff,
derby-590-08-aa-omitLuceneFlag.diff, derby-590-09-aa-localeSensitiveAnalysis.diff, derby-590-10-aa-fixLocaleTest.diff,
derby-590-11-aa-moveCode.diff, derby-590-12-aa-newJar.diff, derby-590-13-aa-indexViews.diff,
derby-590-14-aa-coarseGrainedAuthorization.diff, derby-590-15-aa-requireHardUpgrade.diff,
derby-590-16-aa-adjustUpgradeTest.diff, derby-590-17-aa-closeInputStreamOnPropertiesFile.diff,
derby-590-18-aa-cleanupAPI.diff, derby-590-19-aa-cleanupAPI2.diff, derby-590-20-aa-customQueryParser.diff,
derby-590-21-aa-noTimeTravel.diff, derby-590-22-aa-cleanupPrivacy.diff, derby-590-23-aa-correctTestLocale.diff,
derby-590-24-ad-luceneDirectory.diff, derby-590-26-ac-backupRestore.diff, derby-590-26-ad-backupRestoreEncryption.diff,
derby-590-27-aa-publicAPILuceneUtils.diff, derby-590-28-renameLuceneJars.diff, derby-590-29-aa-useLucene_4.7.1.diff,
derby-590-30-aa-nullableScoreCeiling.diff, exceptions.diff, lucene_demo.diff, lucene_demo_2.diff,
netbeans.diff, netbeans2.diff
> In order to use derby with lucene API what should be the steps to be taken? 

This message was sent by Atlassian JIRA

View raw message