jackrabbit-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From chet...@apache.org
Subject svn commit: r1701956 - /jackrabbit/site/live/oak/docs/query/lucene.html
Date Wed, 09 Sep 2015 09:35:49 GMT
Author: chetanm
Date: Wed Sep  9 09:35:49 2015
New Revision: 1701956

URL: http://svn.apache.org/r1701956
OAK-3367 - Boosting fields not working as expected

Publish the updated doc


Modified: jackrabbit/site/live/oak/docs/query/lucene.html
URL: http://svn.apache.org/viewvc/jackrabbit/site/live/oak/docs/query/lucene.html?rev=1701956&r1=1701955&r2=1701956&view=diff
--- jackrabbit/site/live/oak/docs/query/lucene.html (original)
+++ jackrabbit/site/live/oak/docs/query/lucene.html Wed Sep  9 09:35:49 2015
@@ -1,13 +1,13 @@
 <!DOCTYPE html>
- | Generated by Apache Maven Doxia at 2015-09-08
+ | Generated by Apache Maven Doxia at 2015-09-09
  | Rendered using Apache Maven Fluido Skin 1.3.0
 <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
     <meta charset="UTF-8" />
     <meta name="viewport" content="width=device-width, initial-scale=1.0" />
-    <meta name="Date-Revision-yyyymmdd" content="20150908" />
+    <meta name="Date-Revision-yyyymmdd" content="20150909" />
     <meta http-equiv="Content-Language" content="en" />
     <title>Jackrabbit Oak - Lucene Index</title>
     <link rel="stylesheet" href="../css/apache-maven-fluido-1.3.0.min.css" />
@@ -210,7 +210,7 @@
         <ul class="breadcrumb">
-                  <li id="publishDate">Last Published: 2015-09-08</li>
+                  <li id="publishDate">Last Published: 2015-09-09</li>
                   <li class="divider">|</li> <li id="projectVersion">Version:
@@ -721,7 +721,7 @@
 <li><tt>jcr:content/metadata/.*</tt> - This property definition is  applicable
for all properties of child node <i>jcr:content/metadata</i></li>
-<dd>If the property is included in <tt>nodeScopeIndex</tt> then it defines
the boost  done for the index value against the given property name.  <b>Boost currently
does not work as expected due to <a class="externalLink" href="https://issues.apache.org/jira/browse/OAK-3367">OAK-3367</a></b></dd>
+<dd>If the property is included in <tt>nodeScopeIndex</tt> then it defines
the boost  done for the index value against the given property name. See  <a href="#boost">Boost
and Search Relevancy</a> for more details</dd>
 <dd>Determines if this property should be indexed. Mostly useful for fulltext  index
where some properties need to be <i>excluded</i> from getting indexed.</dd>
@@ -999,6 +999,42 @@
   - codec = &quot;Lucene46&quot;
 <p>Refer to <a class="externalLink" href="https://issues.apache.org/jira/browse/OAK-2853">OAK-2853</a>
for details. Enabling the <tt>Lucene46</tt> codec would lead to smaller and compact
+<p><a name="boost"></a></p></div>
+<div class="section">
+<h4>Boost and Search Relevancy<a name="Boost_and_Search_Relevancy"></a></h4>
+<p><tt>@since Oak 1.2.5</tt></p>
+<p>When fulltext indexing is enabled then internally Oak would create a fulltext field
which consists of text extracted from various other fields i.e. fields for which <tt>nodeScopeIndex</tt>
is <tt>true</tt>. This allows search like <tt>//*[jcr:contains(., 'foo')]</tt>
to perform search across any indexable field containing foo (See <a class="externalLink"
href="http://www.day.com/specs/jcr/1.0/">contains function</a>
for details)</p>
+<p>In certain cases its desirable that those nodes where the searched term is present
in a specific property are ranked higher (come earlier in search result) compared to those
node where the searched term is found in some other property.</p>
+<p>In such cases it should be possible to boost specific text contributed by individual
property. Meaning that if a title field is boosted more than description, then search result
would those node coming earlier where searched term is found in title field</p>
+<p>For that to work ensure that for each such property (which need to be preferred)
both <tt>nodeScopeIndex</tt> and <tt>analyzed</tt> are set to true.
In addition you can specify <tt>boost</tt> property so give higher weightage to
values found in specific property</p>
+<p>Note that even without setting explicit <tt>boost</tt> and just setting
<tt>nodeScopeIndex</tt> and <tt>analyzed</tt> to true would improve
the search result due to the way <a class="externalLink" href="https://wiki.apache.org/lucene-java/LuceneFAQ#How_do_I_make_sure_that_a_match_in_a_document_title_has_greater_weight_than_a_match_in_a_document_body.3F">Lucene
does scoring</a>. Internally Oak would create separate Lucene fields for those jcr properties
and would perform a search across all such fields. For more details refer to <a class="externalLink"
+<div class="source">
+<pre>  + indexRules
+    - jcr:primaryType = &quot;nt:unstructured&quot;
+    + app:Asset
+      + properties
+        - jcr:primaryType = &quot;nt:unstructured&quot;
+        + description
+          - nodeScopeIndex = true
+          - analyzed = true
+          - name = &quot;jcr:content/metadata/jcr:description&quot;
+        + title
+          - analyzed = true
+          - nodeScopeIndex = true
+          - name = &quot;jcr:content/metadata/jcr:title&quot;
+          - boost = 2.0
+<p>With above index config a search like</p>
+<div class="source">
+  *
+FROM [app:Asset] 
+  CONTAINS(., 'Batman')
+<p>Would have those node (of type app:Asset) come first where <i>Batman</i>
is found in <i>jcr:title</i>. While those nodes where search text is found in
other field like aggregated content would come later</p>
 <p><a name="osgi-config"></a></p></div></div>
 <div class="section">
 <h3>LuceneIndexProvider Configuration<a name="LuceneIndexProvider_Configuration"></a></h3>

View raw message