Mailing-List: contact dev-help@geode.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@geode.apache.org
Date: Wed, 17 May 2017 22:12:04 +0000 (UTC)
From: "ASF GitHub Bot (JIRA)" <jira@apache.org>
To: dev@geode.apache.org
Message-ID: <JIRA.13071327.1494535315000.235294.1495059124514@Atlassian.JIRA>
In-Reply-To: <JIRA.13071327.1494535315000@Atlassian.JIRA>
References: <JIRA.13071327.1494535315000@Atlassian.JIRA> <JIRA.13071327.1494535315257@jira-lw-us.apache.org>
Subject: [jira] [Commented] (GEODE-2913) Update Lucene documentation
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
archived-at: Wed, 17 May 2017 22:12:12 -0000


    [ https://issues.apache.org/jira/browse/GEODE-2913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16014848#comment-16014848 ] 

ASF GitHub Bot commented on GEODE-2913:
---------------------------------------

Github user joeymcallister commented on a diff in the pull request:

    https://github.com/apache/geode/pull/518#discussion_r117122461
  
    --- Diff: geode-docs/tools_modules/lucene_integration.html.md.erb ---
    @@ -135,4 +117,164 @@ gfsh> lucene search --regionName=/orders -queryStrings="John*" --defaultField=fi
         </region>
     </cache>
     ```
    +## <a id="lucene-index-query" class="no-quick-link"></a>Queries
     
    +### <a id="gfsh-query-example" class="no-quick-link"></a>Gfsh Example to Query using a Lucene Index
    +
    +For details, see the [gfsh search lucene](gfsh/command-pages/search.html#search_lucene") command reference page.
    +
    +``` pre
    +gfsh> lucene search --regionName=/orders -queryStrings="John*" --defaultField=field1 --limit=100
    +```
    +
    +### <a id="api-query-example" class="no-quick-link"></a>Java API Example to Query using a Lucene Index
    +
    +``` pre
    +LuceneQuery<String, Person> query = luceneService.createLuceneQueryFactory()
    +  .setResultLimit(10)
    +  .create(indexName, regionName, "name:John AND zipcode:97006", defaultField);
    +
    +Collection<Person> results = query.findValues();
    +```
    +
    +## <a id="lucene-index-destroy" class="no-quick-link"></a>Destroying an Index
    +
    +Since a region destroy operation does not cause the destruction
    +of any Lucene indexes,
    +destroy any Lucene indexes prior to destroying the associated region.
    +
    +### <a id="API-destroy-example" class="no-quick-link"></a>Java API Example to Destroy a Lucene Index
    +
    +``` pre
    +luceneService.destroyIndex(indexName, regionName);
    +```
    +An attempt to destroy a region with a Lucene index will result in
    +an `IllegalStateException`,
    +issuing an error message similar to:
    +
    +``` pre
    +java.lang.IllegalStateException: The parent region [/orders] in colocation chain cannot be destroyed,
    + unless all its children [[/indexName#_orders.files]] are destroyed
    +at org.apache.geode.internal.cache.PartitionedRegion.checkForColocatedChildren(PartitionedRegion.java:7231)
    +at org.apache.geode.internal.cache.PartitionedRegion.destroyRegion(PartitionedRegion.java:7243)
    +at org.apache.geode.internal.cache.AbstractRegion.destroyRegion(AbstractRegion.java:308)
    +at DestroyLuceneIndexesAndRegionFunction.destroyRegion(DestroyLuceneIndexesAndRegionFunction.java:46)
    +```
    +### <a id="gfsh-destroy-example" class="no-quick-link"></a>Gfsh Example to Destroy a Lucene Index
    +
    +For details, see the [gfsh destroy lucene index](gfsh/command-pages/destroy.html#destroy_lucene_index") command reference page.
    +
    +The error message that results from an attempt to destroy a region
    +prior to destroying its associated Lucene index
    +issues an error message similar to:
    +
    +``` pre
    +Error occurred while destroying region "orders".
    + Reason: The parent region [/orders] in colocation chain cannot be destroyed,
    + unless all its children [[/indexName#_orders.files]] are destroyed
    +```
    +
    +## <a id="lucene-index-change" class="no-quick-link"></a>Changing an Index
    +
    +Changing an index requires rebuilding it.
    +Implement these steps in `gfsh` to change an index.
    +
    +1. Export all region data
    +2. Destroy the Lucene index
    +3. Destroy the region
    +4. Create a new index
    +5. Create a new region without the user-defined business logic callbacks
    +6. Import the region data with the option to turn on callbacks. 
    +The callbacks will be to invoke a Lucene async event listener to index
    +the data.
    +7. Alter the region to add the user-defined business logic callbacks
    +
    +## <a id="addl-gfsh-api" class="no-quick-link"></a>Additional Gfsh Commands
    +
    +See the [gfsh describe lucene index](gfsh/command-pages/describe.html#describe_lucene_index") command reference page for the command that prints details about
    +a specific index.
    +
    +See the [gfsh list lucene index](gfsh/command-pages/list.html#list_lucene_index") command reference page
    +for the command that prints details about the 
    +Lucene indexes created for all members.
    +
    +# <a id="LuceneRandC" class="no-quick-link"></a>Requirements and Caveats
    +
    +- Join queries between regions are not supported.
    +- Nested objects are not supported.
    +- Lucene indexes will not be stored within off-heap memory.
    +- Lucene queries from within transactions are not supported.
    +On an attempt to query from within a transaction,
    +a `LuceneQueryException` is thrown, issuing an error message
    +on the client (accessor) similar to:
    +
    +``` pre
    +Exception in thread "main" org.apache.geode.cache.lucene.LuceneQueryException:
    + Lucene Query cannot be executed within a transaction
    +at org.apache.geode.cache.lucene.internal.LuceneQueryImpl.findTopEntries(LuceneQueryImpl.java:124)
    +at org.apache.geode.cache.lucene.internal.LuceneQueryImpl.findPages(LuceneQueryImpl.java:98)
    +at org.apache.geode.cache.lucene.internal.LuceneQueryImpl.findPages(LuceneQueryImpl.java:94)
    +at TestClient.executeQuerySingleMethod(TestClient.java:196)
    +at TestClient.main(TestClient.java:59)
    +```
    +- If the Lucene index is not created prior to creating the region,
    +an exception will be thrown while attempting to create the region,
    +issuing an error message simlar to:
    +
    +``` pre
    +[error 2017/05/02 15:19:26.018 PDT <main> tid=0x1] java.lang.IllegalStateException:
    + Must create Lucene index full_index on region /data because it is defined in another member.
    +Exception in thread "main" java.lang.IllegalStateException:
    + Must create Lucene index full_index on region /data because it is defined in another member.
    +at org.apache.geode.internal.cache.CreateRegionProcessor$CreateRegionMessage.handleCacheDistributionAdvisee(CreateRegionProcessor.java:478)
    +at org.apache.geode.internal.cache.CreateRegionProcessor$CreateRegionMessage.process(CreateRegionProcessor.java:379)
    +```
    +- An invalidate of a region entry does not invalidate a corresponding
    --- End diff --
    
    "An invalidate operation of"


> Update Lucene documentation
> ---------------------------
>
>                 Key: GEODE-2913
>                 URL: https://issues.apache.org/jira/browse/GEODE-2913
>             Project: Geode
>          Issue Type: Bug
>          Components: docs
>            Reporter: Karen Smoler Miller
>            Assignee: Karen Smoler Miller
>
> Improvements to the code base that need to be reflected in the docs:
> * Change LuceneService.createIndex to use a factory pattern
> {code:java}
> luceneService.createIndex(region, index, ...)
> {code}
> changes to
> {code:java}
> luceneService.createIndexFactory()
> .addField("field1name")
> .addField("field2name")
> .create()
> {code}
> *  Lucene indexes will *NOT* be stored in off-heap memory.
> * Document how to configure an index on accessors - you still need to create the Lucene index before creating the region, even though this member does not hold any region data.
> If the index is not defined on the accessor, an exception like this will be thrown while attempting to create the region:
> {quote}
> [error 2017/05/02 15:19:26.018 PDT <main> tid=0x1] java.lang.IllegalStateException: Must create Lucene index full_index on region /data because it is defined in another member.
> Exception in thread "main" java.lang.IllegalStateException: Must create Lucene index full_index on region /data because it is defined in another member.
> at org.apache.geode.internal.cache.CreateRegionProcessor$CreateRegionMessage.handleCacheDistributionAdvisee(CreateRegionProcessor.java:478)
> at org.apache.geode.internal.cache.CreateRegionProcessor$CreateRegionMessage.process(CreateRegionProcessor.java:379)
> {quote}
> * Do not need to create a Lucene index on a client with a Proxy cache. The Lucene search will always be done on the server.  Besides, _you can't create an index on a client._
> * If you configure Invalidates for region entries (alone or as part of expiration), these will *NOT* invalidate the Lucene indexes.
> The problem with this is the index contains the keys, but the region doesn't, so the query produces results that don't exist.
> In this test, the first time the query is run, it produces N valid results. The second time it is run it produces N empty results:
> ** load entries
> ** run query
> ** invalidate entries
> ** run query again
> *  Destroying a region will *NOT* automatically destroy any Lucene index associated with that region. Instead, attempting to destroy a region with a Lucene index will throw a colocated region exception. 
> An IllegalStateException is thrown:
> {quote}
> java.lang.IllegalStateException: The parent region [/data] in colocation chain cannot be destroyed, unless all its children [[/cusip_index#_data.files]] are destroyed
> at org.apache.geode.internal.cache.PartitionedRegion.checkForColocatedChildren(PartitionedRegion.java:7231)
> at org.apache.geode.internal.cache.PartitionedRegion.destroyRegion(PartitionedRegion.java:7243)
> at org.apache.geode.internal.cache.AbstractRegion.destroyRegion(AbstractRegion.java:308)
> at DestroyLuceneIndexesAndRegionFunction.destroyRegion(DestroyLuceneIndexesAndRegionFunction.java:46)
> {quote}
> * The process to change a Lucene index using gfsh: 
>       1. export region data
>       2. destroy Lucene index, destroy region 
>       3. create new index, create new region without user-defined business logic callbacks
>       4. import data with option to turn on callbacks (to invoke Lucene Async Event Listener to index the data)
>       5. alter region to add user-defined business logic callbacks
> * Make sure there are no references to replicated regions as they are not supported.
> * Document security implementation and defaults.  If a user has security configured for their cluster, creating a Lucene index requires DATA:MANAGE privilege (similar to OQL), but doing Lucene queries requires DATA:WRITE privilege because a function is called (different from OQL which requires only DATA:READ privilege). Here are all the required privileges for the gfsh commands:
> ** create index requires DATA:MANAGE:region
> ** describe index requires CLUSTER:READ
> ** list indexes requires CLUSTER:READ
> ** search index requires DATA:WRITE
> ** destroy index requires DATA:MANAGE:region
> * A user cannot create a Lucene index on a region that has eviction configured with local destroy. If using Lucene indexing, eviction can only be configured with overflow to disk. In this case, only the region data is overflowed to disk, *NOT* the Lucene index. An UnsupportedOperationException is thrown:
> {quote}
> [error 2017/05/02 16:12:32.461 PDT <main> tid=0x1] java.lang.UnsupportedOperationException: Lucene indexes on regions with eviction and action local destroy are not supported
> Exception in thread "main" java.lang.UnsupportedOperationException: Lucene indexes on regions with eviction and action local destroy are not supported
> at org.apache.geode.cache.lucene.internal.LuceneRegionListener.beforeCreate(LuceneRegionListener.java:85)
> at org.apache.geode.internal.cache.GemFireCacheImpl.invokeRegionBefore(GemFireCacheImpl.java:3154)
> at org.apache.geode.internal.cache.GemFireCacheImpl.createVMRegion(GemFireCacheImpl.java:3013)
> at org.apache.geode.internal.cache.GemFireCacheImpl.basicCreateRegion(GemFireCacheImpl.java:2991)
> {quote}
> * We can use the same field name in different objects where the field has a different data type, but this may have unexpected consequences. For example, if I created an index on the field SSN with these following entries
>       Object_1 object_1 has String SSN = "1111"
>       Object_2 object_2 has Integer SSN = 1111
>       Object_3 object_3 has Float SSN = 1111.0
> Integers and Floats will not be converted into strings. They remain as IntPoint and FloatPoint in the Lucene world. The standard analyzer will not try to tokenize these value. The standard analyzer will only try to break up string values. So,
> **  If I do a string search for "SSN: 1111" , Lucene will return object_1.
> **  If I do an IntRangeQuery for upper limit : 1112 and lower limit : 1110 , Lucene will return object_2
> **  If I do a FloatRangeQuery with upper limit 1111.5 and lower limit : 1111.0 , Lucene will return object_3
> * Similar to OQL, Lucene queries are not supported with transactions; an exception will be thrown. A LuceneQueryException is thrown on the client/accessor:
> {quote}
> Exception in thread "main" org.apache.geode.cache.lucene.LuceneQueryException: Lucene Query cannot be executed within a transaction
> at org.apache.geode.cache.lucene.internal.LuceneQueryImpl.findTopEntries(LuceneQueryImpl.java:124)
> at org.apache.geode.cache.lucene.internal.LuceneQueryImpl.findPages(LuceneQueryImpl.java:98)
> at org.apache.geode.cache.lucene.internal.LuceneQueryImpl.findPages(LuceneQueryImpl.java:94)
> at TestClient.executeQuerySingleMethod(TestClient.java:196)
> at TestClient.main(TestClient.java:59)
> {quote}
> This TransactionException is logged on the server.
> * Backups should only be done for regions with Lucene indexes when the system is 'quiet'; i.e. no puts, updates, or deletes are in progress. Otherwise the backups for Lucene indexes will not match the data in the region that is being indexed (i.e. incremental backups will not be consistent between the data region and the Lucene index region due to delayed processing associated with the AEQ). If the region data needs to be restored from backup, then you must follow the same process for changing a Lucene index in order to re-create the index region.
> *  Update docs section on "Memory Requirements for Cached Data" to include conservative estimate of 737 bytes per entry overhead for a Lucene index. All the other caveats mentioned for OQL indexes also apply for Lucene indexes... your mileage may vary...


--
This message was sent by Atlassian JIRA
(v6.3.15#6346)