geode-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Diane Hardman (JIRA)" <>
Subject [jira] [Commented] (GEODE-2913) Update Lucene documentation
Date Thu, 11 May 2017 22:04:04 GMT


Diane Hardman commented on GEODE-2913:

I noticed that the following 2 bullets were missing from the list of corrections:
 - Add gfsh commands: 'destroy lucene index' and 'describe lucene index'
 - To specify the Lucene index field which represents the entire object, use __REGION_VALUE_FIELD

Are these doc updates covered elsewhere?

> Update Lucene documentation
> ---------------------------
>                 Key: GEODE-2913
>                 URL:
>             Project: Geode
>          Issue Type: Bug
>          Components: docs
>            Reporter: Karen Smoler Miller
>            Assignee: Karen Smoler Miller
> Improvements to the code base that need to be reflected in the docs:
> * Change LuceneService.createIndex to use a factory pattern
> {code:java}
> luceneService.createIndex(region, index, ...)
> {code}
> changes to
> {code:java}
> luceneService.createIndexFactory()
> .setXXX()
> .setYYY()
> .create()
> {code}
> *  Lucene indexes will *NOT* be stored in off-heap memory.
> * Document how to configure an index on accessors - you still need to create the Lucene
index before creating the region, even though this member does not hold any region data.
> If the index is not defined on the accessor, an exception like this will be thrown while
attempting to create the region:
> {quote}
> [error 2017/05/02 15:19:26.018 PDT <main> tid=0x1] java.lang.IllegalStateException:
Must create Lucene index full_index on region /data because it is defined in another member.
> Exception in thread "main" java.lang.IllegalStateException: Must create Lucene index
full_index on region /data because it is defined in another member.
> at org.apache.geode.internal.cache.CreateRegionProcessor$CreateRegionMessage.handleCacheDistributionAdvisee(
> at org.apache.geode.internal.cache.CreateRegionProcessor$CreateRegionMessage.process(
> {quote}
> * Do not need to create a Lucene index on a client with a Proxy cache. The Lucene search
will always be done on the server.  Besides, _you can't create an index on a client._
> * If you configure Invalidates for region entries (alone or as part of expiration), these
will *NOT* invalidate the Lucene indexes.
> The problem with this is the index contains the keys, but the region doesn't, so the
query produces results that don't exist.
> In this test, the first time the query is run, it produces N valid results. The second
time it is run it produces N empty results:
> ** load entries
> ** run query
> ** invalidate entries
> ** run query again
> *  Destroying a region will *NOT* automatically destroy any Lucene index associated with
that region. Instead, attempting to destroy a region with a Lucene index will throw a colocated
region exception. 
> An IllegalStateException is thrown:
> {quote}
> java.lang.IllegalStateException: The parent region [/data] in colocation chain cannot
be destroyed, unless all its children [[/cusip_index#_data.files]] are destroyed
> at org.apache.geode.internal.cache.PartitionedRegion.checkForColocatedChildren(
> at org.apache.geode.internal.cache.PartitionedRegion.destroyRegion(
> at org.apache.geode.internal.cache.AbstractRegion.destroyRegion(
> at DestroyLuceneIndexesAndRegionFunction.destroyRegion(
> {quote}
> * The process to change a Lucene index using gfsh: 
>       1. export region data
>       2. destroy Lucene index, destroy region 
>       3. create new index, create new region without user-defined business logic callbacks
>       4. import data with option to turn on callbacks (to invoke Lucene Async Event Listener
to index the data)
>       5. alter region to add user-defined business logic callbacks
> * Make sure there are no references to replicated regions as they are not supported.
> * Document security implementation and defaults.  If a user has security configured for
their cluster, creating a Lucene index requires DATA:MANAGE privilege (similar to OQL), but
doing Lucene queries requires DATA:WRITE privilege because a function is called (different
from OQL which requires only DATA:READ privilege). Here are all the required privileges for
the gfsh commands:
> ** create index requires DATA:MANAGE:region
> ** describe index requires CLUSTER:READ
> ** list indexes requires CLUSTER:READ
> ** search index requires DATA:WRITE
> ** destroy index requires DATA:MANAGE:region
> * A user cannot create a Lucene index on a region that has eviction configured with local
destroy. If using Lucene indexing, eviction can only be configured with overflow to disk.
In this case, only the region data is overflowed to disk, *NOT* the Lucene index. An UnsupportedOperationException
is thrown:
> {quote}
> [error 2017/05/02 16:12:32.461 PDT <main> tid=0x1] java.lang.UnsupportedOperationException:
Lucene indexes on regions with eviction and action local destroy are not supported
> Exception in thread "main" java.lang.UnsupportedOperationException: Lucene indexes on
regions with eviction and action local destroy are not supported
> at org.apache.geode.cache.lucene.internal.LuceneRegionListener.beforeCreate(
> at org.apache.geode.internal.cache.GemFireCacheImpl.invokeRegionBefore(
> at org.apache.geode.internal.cache.GemFireCacheImpl.createVMRegion(
> at org.apache.geode.internal.cache.GemFireCacheImpl.basicCreateRegion(
> {quote}
> * We can use the same field name in different objects where the field has a different
data type, but this may have unexpected consequences. For example, if I created an index on
the field SSN with these following entries
>       Object_1 object_1 has String SSN = "1111"
>       Object_2 object_2 has Integer SSN = 1111
>       Object_3 object_3 has Float SSN = 1111.0
> Integers and Floats will not be converted into strings. They remain as IntPoint and FloatPoint
in the Lucene world. The standard analyzer will not try to tokenize these value. The standard
analyzer will only try to break up string values. So,
> **  If I do a string search for "SSN: 1111" , Lucene will return object_1.
> **  If I do an IntRangeQuery for upper limit : 1112 and lower limit : 1110 , Lucene will
return object_2
> **  If I do a FloatRangeQuery with upper limit 1111.5 and lower limit : 1111.0 , Lucene
will return object_3
> * Similar to OQL, Lucene queries are not supported with transactions; an exception will
be thrown. A LuceneQueryException is thrown on the client/accessor:
> {quote}
> Exception in thread "main" org.apache.geode.cache.lucene.LuceneQueryException: Lucene
Query cannot be executed within a transaction
> at org.apache.geode.cache.lucene.internal.LuceneQueryImpl.findTopEntries(
> at org.apache.geode.cache.lucene.internal.LuceneQueryImpl.findPages(
> at org.apache.geode.cache.lucene.internal.LuceneQueryImpl.findPages(
> at TestClient.executeQuerySingleMethod(
> at TestClient.main(
> {quote}
> This TransactionException is logged on the server.
> * Backups should only be done for regions with Lucene indexes when the system is 'quiet';
i.e. no puts, updates, or deletes are in progress. Otherwise the backups for Lucene indexes
will not match the data in the region that is being indexed (i.e. incremental backups will
not be consistent between the data region and the Lucene index region due to delayed processing
associated with the AEQ). If the region data needs to be restored from backup, then you must
follow the same process for changing a Lucene index in order to re-create the index region.
> *  Update docs section on "Memory Requirements for Cached Data" to include conservative
estimate of 737 bytes per entry overhead for a Lucene index. All the other caveats mentioned
for OQL indexes also apply for Lucene indexes... your mileage may vary...

This message was sent by Atlassian JIRA

View raw message