lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nicola Buso <nb...@ebi.ac.uk>
Subject Re: Big number of values for facets
Date Fri, 26 Apr 2013 15:22:48 GMT
Hi Shai,

I can't say now how many of these entries I have, I need to trace them,
but I expect their are exceptions, like 10 entries no more.

Can I enable partitions document by document? Should I activate
partitions if I reach a threshold just for these exceptions?


Nicola.

On Fri, 2013-04-26 at 18:04 +0300, Shai Erera wrote:
> Hi Nicola,
> 
> I think this limit denotes the number of bytes you can write in a single DV
> value. So this actually means much less number of facets you index. Do you
> know how many categories are indexed for that one document?
> 
> Also, do you expect to index large number of facets for most documents, or
> is this one extreme example?
> 
> Basically I think you can achieve that by enabling partitions. Partitions
> let you split the categories space into smaller sets, so that each DV value
> contains less values, and also the RAM consumption during search is lower
> since FacetArrays is allocated the size of the partition and not the
> taxonomy. But you also incur search performance loss because counting a
> certain dimension requires traversing multiple DV fields.
> 
> To enable partitions you need to override FacetIndexingParams partition
> size. You can try to play with it.
> 
> In am intetested though to understand the general scenario. Perhaps this
> can be solved some other way...
> 
> Shai
> On Apr 26, 2013 5:44 PM, "Nicola Buso" <nbuso@ebi.ac.uk> wrote:
> 
> > Hi all,
> >
> > I'm encountering a problem to index a document with a large number of
> > values for one facet.
> >
> > Caused by: java.lang.IllegalArgumentException: DocValuesField "$facets"
> > is too large, must be <= 32766
> >         at
> >
> > org.apache.lucene.index.BinaryDocValuesWriter.addValue(BinaryDocValuesWriter.java:57)
> >         at
> >
> > org.apache.lucene.index.DocValuesProcessor.addBinaryField(DocValuesProcessor.java:111)
> >         at
> >
> > org.apache.lucene.index.DocValuesProcessor.addField(DocValuesProcessor.java:57)
> >         at
> >
> > org.apache.lucene.index.TwoStoredFieldsConsumers.addField(TwoStoredFieldsConsumers.java:36)
> >         at
> >
> > org.apache.lucene.index.DocFieldProcessor.processDocument(DocFieldProcessor.java:242)
> >         at
> >
> > org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:256)
> >         at
> >
> > org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:376)
> >         at
> > org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1473)
> >
> >
> > It's obviously hard to visualize such a big number of facets to the user
> > and is also hard to evaluate which of these values to skip to permit to
> > store this document into the index.
> >
> > Do you have any suggestion on how to overcome this number? is it
> > possible?
> >
> >
> >
> > Nicola
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message