gora-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lewi...@apache.org
Subject svn commit: r1601670 - /gora/site/trunk/content/current/gora-solr.md
Date Tue, 10 Jun 2014 16:08:09 GMT
Author: lewismc
Date: Tue Jun 10 16:08:08 2014
New Revision: 1601670

URL: http://svn.apache.org/r1601670
Log:
Complete gora-solr documentation

Modified:
    gora/site/trunk/content/current/gora-solr.md

Modified: gora/site/trunk/content/current/gora-solr.md
URL: http://svn.apache.org/viewvc/gora/site/trunk/content/current/gora-solr.md?rev=1601670&r1=1601669&r2=1601670&view=diff
==============================================================================
--- gora/site/trunk/content/current/gora-solr.md (original)
+++ gora/site/trunk/content/current/gora-solr.md Tue Jun 10 16:08:08 2014
@@ -1,77 +1,176 @@
 Title: Gora HBase Module
 
 ##Overview
-This is the main documentation for the gora-hbase module. gora-hbase 
-module enables [Apache HBase](http://hbase.apache.org) backend support for Gora. 
+This is the main documentation for the gora-solr module. gora-solr 
+module enables [Apache Solr](http://lucene.apache.org/solr) backend support for Gora. 
 
 ##gora.properties 
-* <code>gora.datastore.default=org.apache.gora.hbase.store.HBaseStore</code>
- Implementation of the storage class 
+* <code>gora.datastore.default=org.apache.gora.solr.store.SolrStore</code> -
Implementation of the storage class 
 * <code>gora.datastore.autocreateschema=true</code> - Create the table if doesn't
exist
-* <code>gora.datastore.scanner.caching=1000</code> - HBase client cache that
improves the scan in HBase (default 0)
-* <code>hbase.client.autoflush.default=false</code> -  HBase autoflushing. Enabling
autoflush decreases write performance. Available since Gora 0.2. Defaults to disabled.
+* <code>gora.solrstore.solr.url=http://localhost:9876/solr</code> - The URL of
the Solr server.
+* <code>gora.solrstore.solr.config</code> -  The <code>solrconfig.xml</code>
file to be used.
+* <code>gora.solrstore.solr.schema</code> - The <code>schema.xml</code>
file to be used.
+* <code>gora.solrstore.solr.batchSize</code> - A batch size unit (ArrayList)
of SolrDocument's to be used for writing to Solr. A default value of <b>100</b>
is used if this value is absent. This value must be of type <b>Integer</b>.
+* <code>gora.solrstore.solr.solrjserver</code> - The solrj implementation to
use. This has a default value of <b>http</b> for <i>HttpSolrServer</i>.
Available options include <b>http</b> (<i>HttpSolrServer</i>), <b>cloud</b>
(<i>CloudSolrServer</i>), <b>concurrent</b> (<i>ConcurrentUpdateSolrServer</i>)
and <b>loadbalance</b> (<i>LBSolrServer</i>). This value must be of
type <b>String</b>.
+* <code>gora.solrstore.solr.commitWithin</code> - A batch commit unit for SolrDocument's
used when making (commit) calls to Solr. A default value of 1000 is used if this value is
absent. This value must be of type <b>Integer</b>.
+* <code>gora.solrstore.solr.resultsSize</code> - The maximum number of results
to return when we make a call to <code>org.apache.gora.solr.store.SolrStore#execute(Query)</code>.
This value must be of type <b>Integer</b>.
  
-##Gora HBase mappings
-Say we wished to map some Employee data and store it into the HBaseStore.
+##Gora Solr mappings
+Say we wished to map some Employee data and store it into the SolrStore.
 
     <gora-orm>
-      <table name="Employee">
-        <family name="info" 
-                compression="$$$" 
-                blockCache="$$$" 
-                blockSize="$$$" 
-                bloomFilter="$$$" 
-                maxVersions="$$$" 
-                timeToLive="$$$" 
-                inMemory="$$$" />
-      </table> 
-
-      <class name="org.apache.gora.examples.generated.Employee" keyClass="java.lang.String"
table="Employee">
-        <field name="name" family="info" qualifier="nm"/>
-        <field name="dateOfBirth" family="info" qualifier="db"/>
-        <field name="ssn" family="info" qualifier="sn"/>
-        <field name="salary" family="info" qualifier="sl"/>
-        <field name="boss" family="info" qualifier="bs"/>
-        <field name="webpage" family="info" qualifier="wp"/>
-      </class>
+        <class name="org.apache.gora.examples.generated.Employee" keyClass="java.lang.String"
table="Employee">
+          <primarykey column="ssn"/>
+          <field name="name" column="name"/>
+          <field name="dateOfBirth" column="dateOfBirth"/>
+          <field name="salary" column="salary"/>
+          <field name="boss" column="boss"/>
+          <field name="webpage" column="webpage"/>
+        </class>
     </gora-orm>
 
-Here you can see that we require the definition of two child elements within the 
+Here you can see that we require the definition of only one child element within the 
 <code>gora-orm</code> mapping configuration, namely;
 
-The table element; where we specify: 
-
-1. a parameter relating to the HBase table name (String) e.g. name=<b>"Employee"</b>,

-
-2. a nested element containing the type and definition of families we wish to create within
HBase. In this case we create one family <b>info</b> which could have a combination
of any of the following parameters;
-
-    <b>name</b> (String): family name e.g. info
-
-    <b>compression</b> (String): the compression option to use in HBase. Please
see <a href="http://hbase.apache.org/book/compression.html">HBase documentation</a>.
-
-    <b>blockCache</b> (boolean):  an LRU cache that contains three levels of
block priority to allow for scan-resistance and in-memory ColumnFamilies. Please see <a
href="https://hbase.apache.org/book/regionserver.arch.html#block.cache">HBase documentation</a>.
-
-    <b>blockSize</b> (Integer): The blocksize can be configured for each ColumnFamily
in a table, and this defaults to 64k. Larger cell values require larger blocksizes. There
is an inverse relationship between blocksize and the resulting StoreFile indexes (i.e., if
the blocksize is doubled then the resulting indexes should be roughly halved). Please see
<a href="http://hbase.apache.org/book/perf.schema.html#schema.cf.blocksize">HBase documentation</a>.

-
-    <b>bloomFilter</b> (String): Bloom Filters can be enabled per-ColumnFamily.
We use <code>HColumnDescriptor.setBloomFilterType(NONE | ROW | ROWCOL)</code>
to enable blooms per Column Family. Default = NONE for no bloom filters. If ROW, the hash
of the row will be added to the bloom on each insert. If ROWCOL, the hash of the row + column
family name + column family qualifier will be added to the bloom on each key insert. Please
see <a href="http://hbase.apache.org/book/perf.schema.html#schema.bloom">HBase documentation</a>.
-
-    <b>maxVersions</b> (Integer): The maximum number of row versions to store
is configured per column family via <code>HColumnDescriptor</code>. The default
for max versions is <b>3</b>. This is an important parameter because HBase does
not overwrite row values, but rather stores different values per row by time (and qualifier).
Excess versions are removed during major compaction's. The number of max versions may need
to be increased or decreased depending on application needs. Please see <a href="http://hbase.apache.org/book/schema.versions.html">HBase
documentation</a>.
-
-    <b>timeToLive</b> (Integer): ColumnFamilies can set a TTL length in seconds,
and HBase will automatically delete rows once the expiration time is reached. This applies
to all versions of a row - even the current one. The TTL time encoded in the HBase for the
row is specified in UTC. Please see <a href="https://hbase.apache.org/book/ttl.html">HBase
documentation</a>.
-
-    <b>inMemory</b> (Boolean): ColumnFamilies can optionally be defined as in-memory.
Data is still persisted to disk, just like any other ColumnFamily. In-memory blocks have the
highest priority in the Block Cache, but it is not a guarantee that the entire table will
be in memory. Please see <a href="http://hbase.apache.org/book/perf.schema.html#cf.in.memory">HBase
documentation</a>.
-
 The class element where we specify of persistent fields which values should map to. This
contains;
 
-1. a parameter containing the Persistent class name e.g. <b>org.apache.gora.examples.generated.Employee</b>,

-
-2. a parameter containing the keyClass e.g. <b>java.lang.String</b> which specifies
the keys which map to the field values, 
+1. a parameter containing the Persistent class <b>name</b> e.g. <code>org.apache.gora.examples.generated.Employee</code>,

 
-3. a parameter containing the Table name e.g. <b>Employee</b> which matches to
the above Table definition,
+2. a parameter containing the <b>keyClass</b> e.g. <code>java.lang.String</code>
which specifies the keys which map to the field values, 
 
-4. finally nested child element(s) mapping fields which are to be persisted into HBase. These
fields need to be configured such that they receive;
+3. a parameter containing the <b>Table name</b> e.g. <code>Employee</code>,
 
-    a parameter containing the <b>name</b> e.g. (name, dateOfBirth, ssn and salary
respectively), 
+4. finally nested child element(s) mapping fields which are to be persisted into Solr. <b>We
must provide a primary key for each object that we wish to persist into Solr.</b> Additional
object fields need to be configured such that they receive;
 
-    a parameter containing the column <b>family</b> to which they belong e.g.
(all info in this case), 
+    a parameter containing the <b>name</b> e.g. (name, dateOfBirth, ssn, salary,
boss and webpage respectively), 
+
+    a parameter containing the <b>column family</b> to which they belong e.g.
(all info in this case), 
+
+##Solr Schema.xml
+
+<code>schema.xml</code> is an essential aspect of defining a storage and query
model for your Solr data.
+
+The Solr community maintain their own documentation relating to schema.xml, this can be found
at [http://wiki.apache.org/solr/SchemaXml](http://wiki.apache.org/solr/SchemaXml).
+
+    <schema name="testexample" version="1.5">
+
+      <fields>
+
+        <!-- Common Fields -->
+        <field name="_version_" type="long" indexed="true" stored="true"/>
+
+        <!-- Employee Fields -->
+        <field name="ssn"         type="string" indexed="true" stored="true" required="true"
multiValued="false" /> 
+        <field name="name"        type="string" indexed="true" stored="true" />
+        <field name="dateOfBirth" type="long" stored="true" /> 
+        <field name="salary"      type="int" stored="true" /> 
+        <field name="boss"        type="binary" stored="true" />
+        <field name="webpage"     type="binary" stored="true" />
+    
+      </fields>
+
+      <uniqueKey>ssn</uniqueKey>
+
+      <types>
+
+        <fieldType name="string" class="solr.StrField" sortMissingLast="true" />
+        <fieldType name="int" class="solr.TrieIntField" precisionStep="0" positionIncrementGap="0"/>
+        <fieldType name="long" class="solr.TrieLongField" precisionStep="0" positionIncrementGap="0"/>
+        <fieldtype name="binary" class="solr.BinaryField"/>
+  
+      </types>  
+
+    </schema>
+
+##Solr solrconfig.xml
+
+Similar to <code>schema.xml</code> above, <code>solrconfig.xml</code>
documentation is also maintained by the Solr community.
+
+Please see an example configuration below but also please refer to [http://wiki.apache.org/solr/SolrConfigXml](http://wiki.apache.org/solr/SolrConfigXml).

+
+    <config>
+      <luceneMatchVersion>LUCENE_40</luceneMatchVersion>
+      <dataDir>${solr.data.dir:}</dataDir>
+      <directoryFactory name="DirectoryFactory" 
+                    class="${solr.directoryFactory:solr.NRTCachingDirectoryFactory}"/>

+      <codecFactory class="solr.SchemaCodecFactory"/>
+      <schemaFactory class="ClassicIndexSchemaFactory"/>
+      <indexConfig>
+        <lockType>${solr.lock.type:native}</lockType>
+      </indexConfig>
+
+      <jmx />
+
+      <updateHandler class="solr.DirectUpdateHandler2">
+        <updateLog>
+          <str name="dir">${solr.ulog.dir:}</str>
+        </updateLog>
+      </updateHandler>
+  
+      <query>
+        <maxBooleanClauses>1024</maxBooleanClauses>
+        <filterCache class="solr.FastLRUCache"
+                 size="512"
+                 initialSize="512"
+                 autowarmCount="0"/>
+        <queryResultCache class="solr.LRUCache"
+                     size="512"
+                     initialSize="512"
+                     autowarmCount="0"/>
+        <documentCache class="solr.LRUCache"
+                   size="512"
+                   initialSize="512"
+                   autowarmCount="0"/>
+        <enableLazyFieldLoading>true</enableLazyFieldLoading>
+        <queryResultWindowSize>20</queryResultWindowSize>
+        <queryResultMaxDocsCached>200</queryResultMaxDocsCached>
+        <listener event="newSearcher" class="solr.QuerySenderListener">
+          <arr name="queries">
+          </arr>
+        </listener>
+        <listener event="firstSearcher" class="solr.QuerySenderListener">
+          <arr name="queries">
+            <lst>
+              <str name="q">static firstSearcher warming in solrconfig.xml</str>
+            </lst>
+          </arr>
+        </listener>
+        <useColdSearcher>false</useColdSearcher>
+        <maxWarmingSearchers>2</maxWarmingSearchers>
+      </query>
+
+      <requestDispatcher handleSelect="false" >
+        <requestParsers enableRemoteStreaming="true" 
+                    multipartUploadLimitInKB="2048000"
+                    formdataUploadLimitInKB="2048"
+                    addHttpRequestToContext="false"/>
+        <httpCaching never304="true" />
+      </requestDispatcher>
+
+      <requestHandler name="/select" class="solr.SearchHandler">
+        <lst name="defaults">
+          <str name="echoParams">explicit</str>
+          <int name="rows">10</int>
+          <str name="df">ssn</str>
+        </lst>
+      </requestHandler>
+
+      <requestHandler name="/query" class="solr.SearchHandler">
+        <lst name="defaults">
+          <str name="echoParams">explicit</str>
+          <str name="wt">json</str>
+          <str name="indent">true</str>
+          <str name="df">ssn</str>
+        </lst>
+      </requestHandler>
+
+      <requestHandler name="/get" class="solr.RealTimeGetHandler">
+        <lst name="defaults">
+          <str name="omitHeader">true</str>
+        </lst>
+      </requestHandler>
+
+      <requestHandler name="/update" class="solr.UpdateRequestHandler">
+      </requestHandler>
+    </config>
 
-    an optional parameter <b>qualifier</b>, which enables more granular control
over the data to be persisted into HBase.



Mime
View raw message