lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Rutherglen (JIRA)" <>
Subject [jira] Updated: (SOLR-1395) Integrate Katta
Date Thu, 03 Sep 2009 20:36:57 GMT


Jason Rutherglen updated SOLR-1395:

    Attachment: KATTA-SOLR.patch

This is our first cut at integrating Katta with Solr. The
KattaClientTest test case shows a Katta cluster being created
locally, a couple of cores/shards being placed into the cluster,
then a query being executed that returns the correct number of
results. It takes about 30s - 1.5min to run (hopefully that can
be reduced?). 

Today Solr shards map to Solr servers. Here we map shards to
cores, where there can be multiple shards per server or in Katta
parlance a node. We assume the shards exist in Hadoop HDFS.
Katta copies the shards to a local Solr server to make them
searchable (and incrementally updateable).

h3. Instructions for Installation

* Download Katta trunk "svn co
kattatrunk". Download the KATTA-SOLR.patch to kattatrunk. run "patch
-p 0 -i KATTA-SOLR.patch", "ant -jar", "ant jar-test".

* Download a Solr trunk "svn co solrtrunk".
Copy from kattatrunk: lib/log4j-1.2.13.jar
lib/zookeeper-3.1.1.jar lib/hadoop-core-0.19.0.jar
build/katta-core-0.6-dev.jar build/test-katta-core-0.6-dev.jar
to solrtrunk/lib

* Download SOLR-1395.patch to solrtrunk. Run "patch -p 0 -i

* Run a test while in solrtrunk "ant test-core

h3. General Notes

* SearchHandler's HttpCommComponent has been abstracted out.
There's a CommComponent interface, AbstractCommComponent
implements the generic multithreading ShardRequest ->
ShardResponse logic. EmbeddedSearchHandler executes requests on
a set of local cores. HttpCommComponent implements requests over
HTTP. KattaCommComponent distributes requests using Katta's
Hadoop RPC mechanism.

* The patch enables all of Solr's distributed request types. All
current distributed requests should work as is with no

* Shards/Solr cores may be managed dynamically and remotely
administered from a centralized location (whereas today Solr
typically requires SSHing and manually editing files etc)

* Solr Katta has built in failover, this is tested in

* When a shard is deployed to a Solr server, the schema and
solrconfig are deployed with it. This begs the question of how
updates to the solrconfig and schema are deployed. Redeploying
solrconfig is fairly simple, whereas a schema change implies
recreating the entire shard.

* Maybe there's an easy way to interface with Hadoop index
creation (i.e. as easy as Solr's HTTP based update command)

The patch was created by Jason Venner and Jason Rutherglen

> Integrate Katta
> ---------------
>                 Key: SOLR-1395
>                 URL:
>             Project: Solr
>          Issue Type: New Feature
>    Affects Versions: 1.4
>            Reporter: Jason Rutherglen
>            Priority: Minor
>             Fix For: 1.5
>         Attachments: KATTA-SOLR.patch, SOLR-1395.patch
>   Original Estimate: 336h
>  Remaining Estimate: 336h
> We'll integrate Katta into Solr so that:
> * Distributed search uses Hadoop RPC
> * Shard/SolrCore distribution and management
> * Zookeeper based failover
> * Indexes may be built using Hadoop

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message