lucene-solr-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <>
Subject [Solr Wiki] Update of "SimpleTextCodecExample" by ErickErickson
Date Sat, 20 Oct 2012 16:56:03 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The "SimpleTextCodecExample" page has been changed by ErickErickson:

New page:
= Setting up SimpleTextCodec =
<!> [[Solr4.0]]+ only.
New to 4.0 is the ability to create per-field codecs. An example of this is the SimpleTextCodec
that is distributed with the solr '''source''' code. However, the codecs aren't part of the
binary distribution, which has caused some confusion. These instructions will allow you to
use the SimpleTextCodec as an exemplar.


== Setup ==
You'll need [[|Apache Ant]] and [[|Apache
Subversion]]. NOTE: your machine may already have these installed, try
svn -h
ant -h

if you get the help output, you're good to go.

 * Get the source code
 * Build the example
 * Build the codec jar
 * Modify the solronfig.xml file
 * Modify your schema.xml file
== Get the source code ==
Just follow the instructions at: [[|How
To Contribute]]. The short form is to execute the following comand:
svn checkout
for trunk, or:
svn checkout
for the 4.x branch.

We'll call the directory all this got checked out into SOLR_CODE which will probably be something
like <where you checked things out>/branch_4x

== Build the example ==
Now you need to build the example code. Note: this produces the same code as is present in
the "example" directory in the Solr distro.

cd SOLR_CODE/solr
ant example

This may take a while. You may be prompted to execute a separate step to install Apache Ivy
if you don't already have it on your computer. If you don't, the instructions to install it
will be printed out on the screen when you type "ant example". Follow them and re-execute
"ant example".

You should see "BUILD SUCCESSFUL" eventually.

== Build the codec jar ==
Here's where it gets a bit tricky. The SimpleTextCodec is '''not''' built by the step above.
So here's what you do:
cd SOLR_CODE/lucene/codecs

Again, you should see "BUILD SUCCESSFUL" printed out. But just above that you should see:
"Building jar: SOLR_CODE/lucene/build/codecs/lucene-codecs-<version>.jar". This is the
jar file that you'll need to have , make a note of it.

== Modify the solronfig.xml file ==
This file is located in SOLR_CODE/solr/example/solr/collection1/conf. There are a couple of
things you need to do
 * Make the jar available to Solr next time you start it. 
 * Load the CodecFactory when Solr starts

=== Make the jar available to Solr next time you start it ===
Add a line like this. I put this after the other <lib> directives, but it's pretty arbitrary
as long as it's a direct child of <config>. 
<lib dir="../../../../lucene/build/codecs/" />

=== Load the CodecFactory when Solr starts ===
Add a line like this. Again where this goes is arbitrary, it just has to be a direct child
of <config>. This causes Solr to load this class at startup.
<codecFactory name="CodecFactory" class="solr.SchemaCodecFactory" />

== Modify your schema.xml file ==
This file is located in SOLR_CODE/solr/example/solr/collection1/conf

Whew! all that is preliminary. The rest is more straight-forward. You have to define a fieldType
that uses the coded and you have to use that fieldType in some of your fields. NOTE: it is
NOT necessary to use these in ''all'' your fields, you can specify codecs on a per-field basis.

You only have two more steps...
 * Add a new fieldType using the SimpleTextCodec
 * Use the new fieldType in some fields

=== Add a new fieldType using the SimpleTextCodec ===
Add something like this to the <types> section
<fieldType name="string_simpletext" class="solr.StrField" postingsFormat="SimpleText" />
This is not a very interesting fieldType, notice it's based on the "StrField" which means
that it's not analyzed in any way, so searching is only for the exact input. Of course you
can use fieldTypes with analysis chains like this. Note that this is based on TextField.
<fieldType name="text_simpletext" class="solr.TextField" postingsFormat="SimpleText">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
=== Use the new fieldType in some fields ===
Add some lines like this to the <fields> section
<field name="simple_string" type="string_simpletext" indexed="true" stored="true"/>
<field name="simple_text" type="text_simpletext" indexed="true" stored="true"/>

At this point, you should have the SimpleText results in your SOLR_CODE/solr/example/solr/collection1/data/index
directory, look for files of the form: *SimpleTest*.pst

As always, the first time someone actually follows instructions deficiencies pop out. Feel
free to modify this page with whatever clarifications you think would be helpful.

View raw message