lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gimantha Bandara <giman...@wso2.com>
Subject Re: Exception While searching through indices.
Date Tue, 16 Jun 2015 10:59:08 GMT
Hi Dat,

We have an entity called 'record' which contain a record id, a table name
and a set of values. When we insert records to our data layer, we index
those records by the id and the values. Indexing is done in a separate
thread. I ll explain how this done. When we insert records to data layer,
we insert records as blobs into underlying data source (if it is rdbms it
will be blobs) and also we insert another record to another table (that
index-record  contains all the record ids which need to be indexed). The
separate thread which performs the indexing task, extract the so called
indexing record and extract all the record ids in it, which need to be
indexed. There can be several index-records also. What we did earlier was,
extract all the index-records currently we have then extract the record ids
in them and index them using lucene. We had performance tests, unit tests
they all passed. Then we changed our implementation to use iterators to
extract these records since keeping all the records in a List can cause OOM
issues. Now the tests are passing except facets indexing.
I know it will not be easy to understand the context of the problem. I have
mentioned our source at [1]. When we used the method at line number 312
instead of the method at line number 330, we get the above error. Note that
method is used at line number 422.


[1]
https://github.com/gimantha/carbon-analytics/blob/master/components/analytics-core/org.wso2.carbon.analytics.dataservice/src/main/java/org/wso2/carbon/analytics/dataservice/indexing/AnalyticsDataIndexer.java

On Sun, Jun 14, 2015 at 7:13 PM, Đạt Cao Mạnh <caomanhdat317@gmail.com>
wrote:

> Can you post you scenario in detail along with your modification please?
>
> On 14:09, Sun, 14 Jun 2015 Gimantha Bandara <gimantha@wso2.com> wrote:
>
>> Hi Dat,
>>
>> I can reproduce this behavior even with like 50000 records. Is what you
>> said the only reason that make this exception occur?
>>
>> Thanks,
>>
>> On Sat, Jun 13, 2015 at 5:40 AM, Đạt Cao Mạnh <caomanhdat317@gmail.com>
>> wrote:
>>
>>> Hi, the total number of documents in an index of lucene is
>>> Integer.MAX_VALUE. So using a single lucene index to index billions
>>> documents is not a proper ways. You should consider using Solr Cloud or
>>> Elasticsearch to index your documents.
>>>
>>> On 19:43, Fri, 12 Jun 2015 Gimantha Bandara <gimantha@wso2.com> wrote:
>>>
>>> > Hi all,
>>> >
>>> > We are using Lucene 4.10.3 for indexing. Recently we changed our
>>> > implementation so that we give data batchwise to lucene to index.
>>> Earlier
>>> > we just query all the  data from the data source and index all data at
>>> > once. It works well. But the number of entries can be up to billions.
>>> So
>>> > getting all the data entries from the data source causes OutOfMemory
>>> > sometimes. So we changed the implementation to So that Lucene indexes
>>> the
>>> > data batchwise. Now we are getting the following exception. Can anyone
>>> tell
>>> > me what that exception means?
>>> >
>>> > java.lang.ArrayIndexOutOfBoundsException: 147
>>> >     at
>>> >
>>> >
>>> org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsEnum.advance(Lucene41PostingsReader.java:538)
>>> >     at org.apache.lucene.search.TermScorer.advance(TermScorer.java:85)
>>> >     at
>>> >
>>> >
>>> org.apache.lucene.search.ConjunctionScorer.doNext(ConjunctionScorer.java:82)
>>> >     at
>>> >
>>> >
>>> org.apache.lucene.search.ConjunctionScorer.nextDoc(ConjunctionScorer.java:100)
>>> >     at
>>> >
>>> org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll(Weight.java:192)
>>> >     at
>>> >
>>> org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:163)
>>> >     at org.apache.lucene.search.BulkScorer.score(BulkScorer.java:35)
>>> >     at
>>> > org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:621)
>>> >     at
>>> > org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:309)
>>> >     at
>>> >
>>> org.apache.lucene.facet.FacetsCollector.doSearch(FacetsCollector.java:294)
>>> >     at
>>> >
>>> org.apache.lucene.facet.FacetsCollector.search(FacetsCollector.java:198)
>>> >
>>> >
>>> > --
>>> > Gimantha Bandara
>>> > Software Engineer
>>> > WSO2. Inc : http://wso2.com
>>> > Mobile : +94714961919
>>> >
>>>
>>
>>
>>
>> --
>> Gimantha Bandara
>> Software Engineer
>> WSO2. Inc : http://wso2.com
>> Mobile : +94714961919
>>
>


-- 
Gimantha Bandara
Software Engineer
WSO2. Inc : http://wso2.com
Mobile : +94714961919

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message