drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Rogers <prog...@mapr.com>
Subject Re: [GitHub] drill pull request #:
Date Tue, 29 Aug 2017 23:11:28 GMT
Drill is an analytic engine optimized for numbers and short strings. At present, Drill’s
practical limit on string (i.e. Varchar) length is 256 characters or less.

Drill is vectorized. Drill tries to create “batches” of data with up to 64K records. When
individual columns are wider than 256 (on average) our vectors grow larger than 16 MB in size
and we run into memory issues.

If VarChar columns are 64K in size (the current maximum), we hit the vector limit with only
256 records. Unfortunately, at present, our readers and operators don’t know how to limit
their batch sizes to such a low number (though we are actively working on a fix.)


- Paul

> On Aug 29, 2017, at 3:22 PM, jcesart20 <git@git.apache.org> wrote:
> Github user jcesart20 commented on the pull request:
>    https://github.com/apache/drill/commit/745bcd1f378397d921860eeba382530483644cd3#commitcomment-23958024
>    Hi,
>    We are working with a dataset with columns of large Strings on it, and we are having
the error "UNSUPPORTED_OPERATION ERROR: Trying to write something big in a column columnIndex
0 Limit 65536 Failure while reading file ..."
>    There is a validation using MAX_FIELD_LENGTH 
>    Why this validation use this specific value  MAX_FIELD_LENGTH = 1024 * 64?
>    Is possible handle large Strings like as used to represent wkt Geometries?
>    Regards!
> ---
> If your project is set up for it, you can reply to this email and have your
> reply appear on GitHub as well. If your project does not have this feature
> enabled and wishes so, or if the feature is enabled but not working, please
> contact infrastructure at infrastructure@apache.org or file a JIRA ticket
> with INFRA.
> ---

View raw message