lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yonik Seeley <yo...@lucidworks.com>
Subject Re: UnInvertedField limitations
Date Thu, 06 Sep 2012 22:45:33 GMT
It's actually limited to 24 bits to point to the term list in a
byte[], but there are 256 different arrays, so the maximum capacity is
4B bytes of un-inverted terms, but each bucket is limited to 4B/256 so
the real limit can come in at a little less due to luck.

>From the comments:

 *   There is a single int[maxDoc()] which either contains a pointer
into a byte[] for
 *   the termNumber lists, or directly contains the termNumber list if
it fits in the 4
 *   bytes of an integer.  If the first byte in the integer is 1, the
next 3 bytes
 *   are a pointer into a byte[] where the termNumber list starts.
 *
 *   There are actually 256 byte arrays, to compensate for the fact
that the pointers
 *   into the byte arrays are only 3 bytes long.  The correct byte
array for a document
 *   is a function of it's id.


-Yonik
http://lucidworks.com


On Thu, Sep 6, 2012 at 6:33 PM, Fuad Efendi <fuad@efendi.ca> wrote:
> Hi Jack,
>
>
> 24bit => 16M possibilities, it's clear; just to confirm... the rest is
> unclear, why 4-byte can have 4 million cardinality? I thought it is 4
> billions...
>
>
> And, just to confirm: UnInvertedField allows 16M cardinality, correct?
>
>
>
>
> On 12-08-20 6:51 PM, "Jack Krupansky" <jack@basetechnology.com> wrote:
>
>>It appears that there is a hard limit of 24-bits or 16M for the number of
>>bytes to reference the terms in a single field of a single document. It
>>takes 1, 2, 3, 4, or 5 bytes to reference a term. If it took 4 bytes,
>>that
>>would allow 16/4 or 4 million unique terms - per document. Do you have
>>such
>>large documents? This appears to be a hard limit based of 24-bytes in a
>>Java
>>int.
>>
>>You can try facet.method=enum, but that may be too slow.
>>
>>What release of Solr are you running?
>>
>>-- Jack Krupansky
>>
>>-----Original Message-----
>>From: Fuad Efendi
>>Sent: Monday, August 20, 2012 4:34 PM
>>To: Solr-User@lucene.apache.org
>>Subject: UnInvertedField limitations
>>
>>Hi All,
>>
>>
>>I have a problemÅ   (Yonik, please!) help me, what is Term count limits? I
>>possibly have 256,000,000 different terms in a fieldÅ  or 16,000,000?
>>
>>Thanks!
>>
>>
>>2012-08-20 16:20:19,262 ERROR [solr.core.SolrCore] - [pool-1-thread-1] - :
>>org.apache.solr.common.SolrException: Too many values for UnInvertedField
>>faceting on field enrich_keywords_string_mv
>>        at
>>org.apache.solr.request.UnInvertedField.<init>(UnInvertedField.java:179)
>>        at
>>org.apache.solr.request.UnInvertedField.getUnInvertedField(UnInvertedField
>>.j
>>ava:668)
>>        at
>>org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:326)
>>        at
>>org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java
>>:4
>>23)
>>        at
>>org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:206)
>>        at
>>org.apache.solr.handler.component.FacetComponent.process(FacetComponent.ja
>>va
>>:85)
>>        at
>>org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHa
>>nd
>>ler.java:204)
>>        at
>>org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBas
>>e.
>>java:129)
>>        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1561)
>>
>>
>>
>>
>>--
>>Fuad Efendi
>>http://www.tokenizer.ca
>>
>>
>>
>
>

Mime
View raw message