orc-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jerry Adair (JIRA)" <j...@apache.org>
Subject [jira] [Created] (ORC-541) Extend CHAR behavior to STRING
Date Sun, 11 Aug 2019 20:05:00 GMT
Jerry Adair created ORC-541:

             Summary: Extend CHAR behavior to STRING
                 Key: ORC-541
                 URL: https://issues.apache.org/jira/browse/ORC-541
             Project: ORC
          Issue Type: Improvement
          Components: C++
    Affects Versions: 1.5.6
            Reporter: Jerry Adair
             Fix For: 1.5.7

This issue is a dual-purpose animal of sorts; I'd like to offer a suggestion and a contribution
to satisfy that suggestion, as well as to ask a question.  The context is in regard to why
the ORC types of CHAR and VARCHAR are processed differently from that of STRING.  I'm guessing
that there was a reason, but not certain as to what that reason might be.


The specific area that I am addressing is in regard to the maxLength attribute of the TypeImpl
class.  With CHAR and VARCHAR, a user can define this maxLength attribute but with STRING
they cannot.  Granted, there is a "convenience method" if you will for only the CHAR class,

 ORC_UNIQUE_PTR<Type> createCharType(TypeKind kind,
 uint64_t maxLength);

In my lil' test program, I used this like so:

container->addStructField( std::string( "char column" ), createCharType( orc::TypeKind::CHAR,
20 ) );


So at a minimum it would seem that there should be an equivalent for the VARCHAR type.  However
I was able to "get crafty" and create one via the following:

container->addStructField( std::string( "varchar column" ), std::unique_ptr<Type>(new
TypeImpl(orc::TypeKind::VARCHAR, 20)));


And both of these would produce a type of either char(20) or varchar(20) and the getMaximumLength()
method would return a value of 20 as well.


However, none of this works for the STRING type.  As with VARCHAR, there is no "convenience
method" and a similar attempt to that of the varchar shown above, thus:

container->addStructField( std::string( "string column" ), std::unique_ptr<Type>(new
TypeImpl(orc::TypeKind::STRING, 20)));

failed to produce the result I would have expected.  It was easy to see why the output type
was just "string", that is readily seen in the toString() method.  However I was a bit surprised
to see that getMaximumLength returned 0 when I used the second variant of the TypeImpl constructor,
ergo the one that has the maxLength set via the second parm.


Unfortunately I didn't have time to dig into why that was happening, but I'd seen enough to
warrant an issue report, albeit not of critical importance.


All that said, as a user of ORC, I'd like to see the STRING type handled in the same manner
as the CHAR or VARCHAR type, with convenience methods for both, as there is for CHAR.  Or
at least learn why there is only the one convenience method and why STRING is treated so differently. 
We could use this functionality in our project (in which we use ORC), and this is the reason
I am opening the issue ticket in the first place.


I'd be willing to contribute the fix, as it seems easy enough to do.  But I'll leave that
up to Owen or other project folk to decide.



This message was sent by Atlassian JIRA

View raw message