camel-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ashwin Karpe (JIRA)" <j...@apache.org>
Subject [jira] Issue Comment Edited: (CAMEL-1472) Lucene Component
Date Mon, 28 Dec 2009 06:57:40 GMT

    [ https://issues.apache.org/activemq/browse/CAMEL-1472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=56667#action_56667
] 

Ashwin Karpe edited comment on CAMEL-1472 at 12/28/09 6:57 AM:
---------------------------------------------------------------

Hi Claus, Jon & Hadrian, 

I have created a new Apache Lucene Component & Query processor and have attached a patch
along with a zip file containing the code for your review.  I have also added the requisite
unit tests and ensured that the code undergoes checkstyle validation.

The component works as follows

Lucene Producer: Index Creation example
----------------------------------------------------------
       context.addRoutes(new RouteBuilder() {
            public void configure() {
                from("direct:start").
                    to("lucene://stdQuotesIndex?analyzerRef=#stdAnalyzer&indexDir=#std&srcDir=#load_dir").
                    to("mock:result");

            }
        });

where each URI parameter setting does the following 
       - analyzerRef:  can be any valid implementation of Lucene Directory Analyzer (StandardAnalyzer,
WhitespaceAnalyzer, StopAnalyzer... etc)
       - srcDir: an optional directory location for loading Text or XML documents at endpoint
or Lucene Index creation.    

Once created the index can take any exchange body and store its contents in the index.

Important Note: Lucene stipulates that the index be created upfront and then used in a read
only mode later for any querying. Hence the index cannot be in flux during query processing.
This requires the Lucene Producer to have received its payloads upfront and created the index
before any queries can be logged against it.  

Since the URI settings cannot be directly passed, I pass them using the JNDI registry associated
with the the Default Component (example shown below).  

Example: Providing values for the Lucene URI
--------------------------------------------------------------
    @Override
    protected JndiRegistry createRegistry() throws Exception {
        JndiRegistry registry = new JndiRegistry(createJndiContext());
        registry.bind("std", new File("target/stdindexDir"));
        registry.bind("load_dir", new File("src/test/resources/sources"));
        registry.bind("stdAnalyzer", new StandardAnalyzer(Version.LUCENE_CURRENT));
        return registry;
    }

I have also added a Query Processor that is fully capable of running any queries (including
wildcards etc) against a Lucene Document Index and present the results in a schema driven
XML format (example provided below)

Example:  Query Processor for Lucene called LuceneSearcher
-------------------------------------------------------------------------------------
       context.addRoutes(new RouteBuilder() {
            public void configure() {
                
                from("direct:start").
                    setHeader("QUERY", constant("Rodney Dangerfield")).
                    process(new LuceneSearcher("target/stdindexDir", analyzer, null, 20)).
                    to("mock:searchResult");
            }
        });  

Example: Search Results presentation Format
----------------------------------------------------------------
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<hits xmlns="http://camel.apache.org/lucene/SearchData">
      <numberOfHits>2</numberOfHits>
      <hit>
             <number>1</number>
             <hitLocation>15</hitLocation>
             <score>0.9453935</score>
             <data>I worked in a pet store and people kept asking how big I?d get. -
Rodney Dangerfield</data>
      </hit>
      <hit>
              <number>2</number>
              <hitLocation>13</hitLocation>
              <score>0.8272193</score>
              <data>I tell ya when I was a kid, all I knew was rejection. My yo-yo,
it never came back. - Rodney Dangerfield</data>
      </hit>
</hits>

I used the latest version of Lucene version 3.0 for the implementation but this can be moved
up easily over time since I have no hard restrictions on Lucene versions. The API sets could
be different moving backwards though. I have not verified this.... Lucene has undergone a
lot of change in each subsequent version it seems :). The good news is that for the most part
they offer backward compatibility for API's.

Please find attached the patch as well as a zip file containing the code.

Can you please review and please let me know what you think. I would be happy to update the
documentation once I get your feedback and am happy to make any needed changes.

Cheers,

Ashwin...

      was (Author: akarpe):
    Hi Claus, Jon & Hadrian, 

I have created a new Apache Lucene Component & Query processor and have attached a patch
along with a zip file containing the code for your review.  I have also added the requisite
unit tests and ensured that the code undergoes checkstyle validation.

The component works as follows

Lucene Producer: Index Creation example
----------------------------------------------------------
       context.addRoutes(new RouteBuilder() {
            public void configure() {
                from("direct:start").
                    to("lucene://stdQuotesIndex?analyzerRef=#stdAnalyzer&indexDir=#std&srcDir=#load_dir").
                    to("mock:result");

            }
        });

where each URI parameter setting does the following 
       - analyzerRef:  can be any valid implementation of Lucene Directory Analyzer (StandardAnalyzer,
WhitespaceAnalyzer, StopAnalyzer... etc)
       - srcDir: an optional directory location for loading Text or XML documents at endpoint
or Lucene Index creation.    

Since these settings cannot be directly passed into the URI, I pass them using the JNDI registry
associated with the the Default Component (example shown below).  

Example: Providing values for the Lucene URI
--------------------------------------------------------------
    @Override
    protected JndiRegistry createRegistry() throws Exception {
        JndiRegistry registry = new JndiRegistry(createJndiContext());
        registry.bind("std", new File("target/stdindexDir"));
        registry.bind("load_dir", new File("src/test/resources/sources"));
        registry.bind("stdAnalyzer", new StandardAnalyzer(Version.LUCENE_CURRENT));
        return registry;
    }

I have also added a Query Processor that is fully capable of running any queries (including
wildcards etc) against a Lucene Document Index and present the results in a schema driven
XML format (example provided below)

Example:  Query Processor for Lucene called LuceneSearcher
-------------------------------------------------------------------------------------
       context.addRoutes(new RouteBuilder() {
            public void configure() {
                
                from("direct:start").
                    setHeader("QUERY", constant("Rodney Dangerfield")).
                    process(new LuceneSearcher("target/stdindexDir", analyzer, null, 20)).
                    to("mock:searchResult");
            }
        });  

Example: Search Results presentation Format
----------------------------------------------------------------
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<hits xmlns="http://camel.apache.org/lucene/SearchData">
      <numberOfHits>2</numberOfHits>
      <hit>
             <number>1</number>
             <hitLocation>15</hitLocation>
             <score>0.9453935</score>
             <data>I worked in a pet store and people kept asking how big I?d get. -
Rodney Dangerfield</data>
      </hit>
      <hit>
              <number>2</number>
              <hitLocation>13</hitLocation>
              <score>0.8272193</score>
              <data>I tell ya when I was a kid, all I knew was rejection. My yo-yo,
it never came back. - Rodney Dangerfield</data>
      </hit>
</hits>

I used the latest version of Lucene version 3.0 for the implementation but this can be moved
up easily over time since I have no hard restrictions on Lucene versions. The API sets could
be different moving backwards though. I have not verified this.... Lucene has undergone a
lot of change in each subsequent version it seems :). The good news is that for the most part
they offer backward compatibility for API's.

Please find attached the patch as well as a zip file containing the code.

Can you please review and please let me know what you think. I would be happy to update the
documentation once I get your feedback and am happy to make any needed changes.

Cheers,

Ashwin...
  
> Lucene Component
> ----------------
>
>                 Key: CAMEL-1472
>                 URL: https://issues.apache.org/activemq/browse/CAMEL-1472
>             Project: Apache Camel
>          Issue Type: New Feature
>            Reporter: Claus Ibsen
>            Assignee: Ashwin Karpe
>             Fix For: Future
>
>         Attachments: camel-lucene-20091227.patch, camel-lucene.zip
>
>
> We should add a new component for Apache Lucene integration

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message