geode-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <>
Subject [jira] [Commented] (GEODE-2913) Update Lucene documentation
Date Wed, 24 May 2017 22:18:04 GMT


ASF GitHub Bot commented on GEODE-2913:

Github user boglesby commented on the issue:
    The default analyzer is Geode's not Lucene's so I think the sentence should be something
    When no analyzer is specified, the org.apache.lucene.analysis.standard.StandardAnalyzer
will be used.
    The default analyzer is the org.apache.lucene.analysis.standard.StandardAnalyzer if none
is specified.
    For this sentence: To use the entire value as a single field set the required --field
option to be __REGION_VALUE_FIELD.
    You might add something like: This is only supported when the region entry value is a
String, Long, Integer, Float, Double.
    In the xml section, you might want to add a field that uses the default analyzer like:
    <lucene:field name="d"/>
    The error message (below) when attempting to destroy a region with an index is probably
going to change. It'll be more like: All lucene indexes must be destroyed before destroying
the data region.
    java.lang.IllegalStateException: The parent region [/orders] in colocation chain
     cannot be destroyed, unless all its children [[/indexName#_orders.files]] are

> Update Lucene documentation
> ---------------------------
>                 Key: GEODE-2913
>                 URL:
>             Project: Geode
>          Issue Type: Bug
>          Components: docs
>            Reporter: Karen Smoler Miller
>            Assignee: Karen Smoler Miller
>             Fix For: 1.2.0
> Improvements to the code base that need to be reflected in the docs:
> * Change LuceneService.createIndex to use a factory pattern
> {code:java}
> luceneService.createIndex(region, index, ...)
> {code}
> changes to
> {code:java}
> luceneService.createIndexFactory()
> .addField("field1name")
> .addField("field2name")
> .create()
> {code}
> *  Lucene indexes will *NOT* be stored in off-heap memory.
> * Document how to configure an index on accessors - you still need to create the Lucene
index before creating the region, even though this member does not hold any region data.
> If the index is not defined on the accessor, an exception like this will be thrown while
attempting to create the region:
> {quote}
> [error 2017/05/02 15:19:26.018 PDT <main> tid=0x1] java.lang.IllegalStateException:
Must create Lucene index full_index on region /data because it is defined in another member.
> Exception in thread "main" java.lang.IllegalStateException: Must create Lucene index
full_index on region /data because it is defined in another member.
> at org.apache.geode.internal.cache.CreateRegionProcessor$CreateRegionMessage.handleCacheDistributionAdvisee(
> at org.apache.geode.internal.cache.CreateRegionProcessor$CreateRegionMessage.process(
> {quote}
> * Do not need to create a Lucene index on a client with a Proxy cache. The Lucene search
will always be done on the server.  Besides, _you can't create an index on a client._
> * If you configure Invalidates for region entries (alone or as part of expiration), these
will *NOT* invalidate the Lucene indexes.
> The problem with this is the index contains the keys, but the region doesn't, so the
query produces results that don't exist.
> In this test, the first time the query is run, it produces N valid results. The second
time it is run it produces N empty results:
> ** load entries
> ** run query
> ** invalidate entries
> ** run query again
> *  Destroying a region will *NOT* automatically destroy any Lucene index associated with
that region. Instead, attempting to destroy a region with a Lucene index will throw a colocated
region exception. 
> An IllegalStateException is thrown:
> {quote}
> java.lang.IllegalStateException: The parent region [/data] in colocation chain cannot
be destroyed, unless all its children [[/cusip_index#_data.files]] are destroyed
> at org.apache.geode.internal.cache.PartitionedRegion.checkForColocatedChildren(
> at org.apache.geode.internal.cache.PartitionedRegion.destroyRegion(
> at org.apache.geode.internal.cache.AbstractRegion.destroyRegion(
> at DestroyLuceneIndexesAndRegionFunction.destroyRegion(
> {quote}
> * The process to change a Lucene index using gfsh: 
>       1. export region data
>       2. destroy Lucene index, destroy region 
>       3. create new index, create new region without user-defined business logic callbacks
>       4. import data with option to turn on callbacks (to invoke Lucene Async Event Listener
to index the data)
>       5. alter region to add user-defined business logic callbacks
> * Make sure there are no references to replicated regions as they are not supported.
> * Document security implementation and defaults.  If a user has security configured for
their cluster, creating a Lucene index requires DATA:MANAGE privilege (similar to OQL), but
doing Lucene queries requires DATA:WRITE privilege because a function is called (different
from OQL which requires only DATA:READ privilege). Here are all the required privileges for
the gfsh commands:
> ** create index requires DATA:MANAGE:region
> ** describe index requires CLUSTER:READ
> ** list indexes requires CLUSTER:READ
> ** search index requires DATA:WRITE
> ** destroy index requires DATA:MANAGE:region
> * A user cannot create a Lucene index on a region that has eviction configured with local
destroy. If using Lucene indexing, eviction can only be configured with overflow to disk.
In this case, only the region data is overflowed to disk, *NOT* the Lucene index. An UnsupportedOperationException
is thrown:
> {quote}
> [error 2017/05/02 16:12:32.461 PDT <main> tid=0x1] java.lang.UnsupportedOperationException:
Lucene indexes on regions with eviction and action local destroy are not supported
> Exception in thread "main" java.lang.UnsupportedOperationException: Lucene indexes on
regions with eviction and action local destroy are not supported
> at org.apache.geode.cache.lucene.internal.LuceneRegionListener.beforeCreate(
> at org.apache.geode.internal.cache.GemFireCacheImpl.invokeRegionBefore(
> at org.apache.geode.internal.cache.GemFireCacheImpl.createVMRegion(
> at org.apache.geode.internal.cache.GemFireCacheImpl.basicCreateRegion(
> {quote}
> * We can use the same field name in different objects where the field has a different
data type, but this may have unexpected consequences. For example, if I created an index on
the field SSN with these following entries
>       Object_1 object_1 has String SSN = "1111"
>       Object_2 object_2 has Integer SSN = 1111
>       Object_3 object_3 has Float SSN = 1111.0
> Integers and Floats will not be converted into strings. They remain as IntPoint and FloatPoint
in the Lucene world. The standard analyzer will not try to tokenize these value. The standard
analyzer will only try to break up string values. So,
> **  If I do a string search for "SSN: 1111" , Lucene will return object_1.
> **  If I do an IntRangeQuery for upper limit : 1112 and lower limit : 1110 , Lucene will
return object_2
> **  If I do a FloatRangeQuery with upper limit 1111.5 and lower limit : 1111.0 , Lucene
will return object_3
> * Similar to OQL, Lucene queries are not supported with transactions; an exception will
be thrown. A LuceneQueryException is thrown on the client/accessor:
> {quote}
> Exception in thread "main" org.apache.geode.cache.lucene.LuceneQueryException: Lucene
Query cannot be executed within a transaction
> at org.apache.geode.cache.lucene.internal.LuceneQueryImpl.findTopEntries(
> at org.apache.geode.cache.lucene.internal.LuceneQueryImpl.findPages(
> at org.apache.geode.cache.lucene.internal.LuceneQueryImpl.findPages(
> at TestClient.executeQuerySingleMethod(
> at TestClient.main(
> {quote}
> This TransactionException is logged on the server.
> * Backups should only be done for regions with Lucene indexes when the system is 'quiet';
i.e. no puts, updates, or deletes are in progress. Otherwise the backups for Lucene indexes
will not match the data in the region that is being indexed (i.e. incremental backups will
not be consistent between the data region and the Lucene index region due to delayed processing
associated with the AEQ). If the region data needs to be restored from backup, then you must
follow the same process for changing a Lucene index in order to re-create the index region.
> *  Update docs section on "Memory Requirements for Cached Data" to include conservative
estimate of 737 bytes per entry overhead for a Lucene index. All the other caveats mentioned
for OQL indexes also apply for Lucene indexes... your mileage may vary...

This message was sent by Atlassian JIRA

View raw message