lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shalin Shekhar Mangar (JIRA)" <j...@apache.org>
Subject [jira] [Created] (SOLR-7971) Reduce memory allocated by JavaBinCodec to encode large strings
Date Tue, 25 Aug 2015 16:25:48 GMT
Shalin Shekhar Mangar created SOLR-7971:
-------------------------------------------

             Summary: Reduce memory allocated by JavaBinCodec to encode large strings
                 Key: SOLR-7971
                 URL: https://issues.apache.org/jira/browse/SOLR-7971
             Project: Solr
          Issue Type: Sub-task
          Components: Response Writers, SolrCloud
            Reporter: Shalin Shekhar Mangar
            Assignee: Shalin Shekhar Mangar
            Priority: Minor
             Fix For: Trunk, 5.4


As discussed in SOLR-7927, we can reduce the buffer memory allocated by JavaBinCodec while
writing large strings.

https://issues.apache.org/jira/browse/SOLR-7927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14700420#comment-14700420
{quote}
The maximum Unicode code point (as of Unicode 8 anyway) is U+10FFFF ([http://www.unicode.org/glossary/#code_point]).
 This is encoded in UTF-16 as surrogate pair {{\uDBFF\uDFFF}}, which takes up two Java chars,
and is represented in UTF-8 as the 4-byte sequence {{F4 8F BF BF}}.  This is likely where
the mistaken 4-bytes-per-Java-char formulation came from: the maximum number of UTF-8 bytes
required to represent a Unicode *code point* is 4.

The maximum Java char is {{\uFFFF}}, which is represented in UTF-8 as the 3-byte sequence
{{EF BF BF}}.

So I think it's safe to switch to using 3 bytes per Java char (the unit of measurement returned
by {{String.length()}}), like {{CompressingStoredFieldsWriter.writeField()}} does.
{quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message