lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From luc...@digiatlas.org
Subject problems with large Lucene index
Date Thu, 05 Mar 2009 09:16:42 GMT
Hello,

I am using Lucene via Hibernate Search but the following problem is  
also seen using Luke. I'd appreciate any suggestions for solving this  
problem.

I have a Lucene index (27Gb in size) that indexes a database table of  
286 million rows. While Lucene was able to perform this indexing just  
fine (albeit very slowly), using the index has proved to be  
impossible. Any searches conducted on it, either from my Hibernate  
Search query or by placing the query into Luke give:

java.lang.OutOfMemoryError: Java heap space
at org.apache.lucene.index.MultiReader.norms(MultiReader.java:271)
at org.apache.lucene.search.TermQuery$TermWeight.scorer(TermQuery.java:69)
at  
org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:230)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:131)
...


The type of queries are simple, of the form:

(+value:church +marcField:245 +subField:a)

which in this example should only return a few thousand results.


The interpreter is already running with the maximum of heap space  
allowed on for the Java executable running on Windows XP ( java -Xms  
1200m -Xmx 1200m)


The Lucene index was created using the following Hibernate Search annotations:

@Column
@Analyzer(impl=org.apache.lucene.analysis.SimpleAnalyzer.class)
@Field(index=org.hibernate.search.annotations.Index.NO_NORMS, store=Store.NO)
private Integer marcField;

@Column (length = 2)
@Analyzer(impl=org.apache.lucene.analysis.SimpleAnalyzer.class)
@Field(index=org.hibernate.search.annotations.Index.NO_NORMS, store=Store.NO)
private String subField;

@Column(length = 2)
@Analyzer(impl=org.apache.lucene.analysis.SimpleAnalyzer.class)
@Field(index=org.hibernate.search.annotations.Index.NO_NORMS, store=Store.NO)
private String indicator1;

@Column(length = 2)
@Analyzer(impl=org.apache.lucene.analysis.SimpleAnalyzer.class)
@Field(index=org.hibernate.search.annotations.Index.NO_NORMS, store=Store.NO)
private String indicator2;

@Column(length = 10000)
@Field(index=org.hibernate.search.annotations.Index.TOKENIZED, store=Store.NO)
private String value;

@Column
@Analyzer(impl=org.apache.lucene.analysis.SimpleAnalyzer.class)
@Field(index=org.hibernate.search.annotations.Index.NO_NORMS, store=Store.NO)
private Integer recordId;


So all of the fields have NO NORMS except for "value" which is  
contains description text that needs to be tokenised.

Is there any way around this?  Does Lucene really have such a low  
limit for how much data it can search (and I consider 286 million  
documents to be pretty small beer - we were hoping to index a table of  
over a billion rows)? Or is there something I'm missing?

Thanks.




Mime
View raw message