db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Knut Anders Hatlen (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (DERBY-590) How to integrate Derby with Lucene API?
Date Fri, 06 Jun 2014 11:26:02 GMT

     [ https://issues.apache.org/jira/browse/DERBY-590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Knut Anders Hatlen updated DERBY-590:
-------------------------------------

    Attachment: multifield.diff

Thanks, Rick. Those were the exact changes that were needed.

The attached patch [^multifield.diff] shows an example of how it could be used.

I made two small adjustments:

1) Instead of hard-coding the field names, I made LuceneSupport read them dynamically from
a database property (derby.tests.lucene.fields), so that I could verify that the original
Lucene tests still pass. (They do still pass, by the way.) Also the field names are stored
in the Lucene index property file, so that LuceneQueryVTI can find them too. This is of course
just a temporary hack until we figure out the correct API.

2) I made LuceneUtils.defaultQueryParser() always return a MultiFieldQueryParser, since MultiFieldQueryParser
seems to behave just like QueryParser in the degenerate case with a single field.

Since I didn't feel like writing a Java source file parser, I changed my example use case
to search in XML files, so that I could use the XML parser that is in the JRE. I added a test
case to LuceneSupportTest to verify that it could be used for that.

The test case creates an index with two fields: tags and text. The tags field contains only
the XML tags, whereas the text field contains only the text elements of the XML file. This
way, you can use the index to search for data and metadata separately in the XML documents
stored in your table.

Now, while writing the test case, I found that you will most likely want to use a custom query
parser when you use it this way. The reason is that the default query parser uses the same
analyzer as the index writer used to extract tokens from the search terms. That means, if
you like in this case use a custom analyzer that parser XML documents, the query parser will
also expect the terms in the query to be XML documents. So you'll end up with rather silly-looking
queries.

For example, to search for documents that contain the text "abc", you cannot make the query
{{text:"abc"}}, but have to wrap it in dummy XML tags to make it parsable {{text:"<dummy>abc</dummy>"}}.

The custom query parser doesn't need to be very complex, though. The test case in the patch
shows one example in the method {{createXMLQueryParser()}}. That method simply creates a MultiFieldQueryParser
with a plain StandardAnalyzer. With that parser, you can write queries like:

- {{text:abc}} to search for "abc" in the text elements of the XML

- {{tags:abc}} to search for XML tags called "abc"

- {{abc}} to search for "abc" in both text elements and tags

What do you think? Does it sound like a useful addition?

> How to integrate Derby with Lucene API?
> ---------------------------------------
>
>                 Key: DERBY-590
>                 URL: https://issues.apache.org/jira/browse/DERBY-590
>             Project: Derby
>          Issue Type: Improvement
>          Components: Documentation, SQL
>            Reporter: Abhijeet Mahesh
>              Labels: derby_triage10_11
>         Attachments: LucenePlugin.html, LucenePlugin.html, LucenePlugin.html, derby-590-01-ag-publicAccessToLuceneRoutines.diff,
derby-590-01-ah-publicAccessToLuceneRoutines.diff, derby-590-01-am-publicAccessToLuceneRoutines.diff,
derby-590-02-aa-cleanupFindbugsErrors.diff, derby-590-03-aa-removeTestingDiagnostic.diff,
derby-590-04-aa-removeIDFromListIndexes.diff, derby-590-05-aa-accessDeclaredMembers.diff,
derby-590-06-aa-suppressAccessChecks.diff, derby-590-07-aa-accessClassInPackage.sun.misc.diff,
derby-590-08-aa-omitLuceneFlag.diff, derby-590-09-aa-localeSensitiveAnalysis.diff, derby-590-10-aa-fixLocaleTest.diff,
derby-590-11-aa-moveCode.diff, derby-590-12-aa-newJar.diff, derby-590-13-aa-indexViews.diff,
derby-590-14-aa-coarseGrainedAuthorization.diff, derby-590-15-aa-requireHardUpgrade.diff,
derby-590-16-aa-adjustUpgradeTest.diff, derby-590-17-aa-closeInputStreamOnPropertiesFile.diff,
derby-590-18-aa-cleanupAPI.diff, derby-590-19-aa-cleanupAPI2.diff, derby-590-20-aa-customQueryParser.diff,
derby-590-21-aa-noTimeTravel.diff, derby-590-22-aa-cleanupPrivacy.diff, derby-590-23-aa-correctTestLocale.diff,
derby-590-24-ad-luceneDirectory.diff, derby-590-26-ac-backupRestore.diff, derby-590-26-ad-backupRestoreEncryption.diff,
derby-590-27-aa-publicAPILuceneUtils.diff, derby-590-28-renameLuceneJars.diff, derby-590-29-aa-useLucene_4.7.1.diff,
derby-590-30-aa-nullableScoreCeiling.diff, exceptions.diff, lucene_demo.diff, lucene_demo_2.diff,
multifield.diff, netbeans.diff, netbeans2.diff
>
>
> In order to use derby with lucene API what should be the steps to be taken? 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message