pulsar-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [pulsar] hnail edited a comment on issue #4747: [pulsar-sql] Support arrays and maps
Date Thu, 27 Aug 2020 09:03:52 GMT

hnail edited a comment on issue #4747:
URL: https://github.com/apache/pulsar/issues/4747#issuecomment-681822007


   The reason is same as [issues-7652](https://github.com/apache/pulsar/issues/7652) : 
   
   1.  [PulsarMetadata.getColumns()](https://github.com/apache/pulsar/blob/master/pulsar-sql/presto-pulsar/src/main/java/org/apache/pulsar/sql/presto/PulsarMetadata.java#L468)
, nested field is dissociate with presto ParameterizedType in  TypeManager . nested field
should be Row type in presto (reference `Hive struct type support`  https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types#LanguageManualTypes-ComplexTypes)
   2.  SchemaHandler is hard to work with [RecordCursor.getObject()](https://github.com/apache/pulsar/blob/master/pulsar-sql/presto-pulsar/src/main/java/org/apache/pulsar/sql/presto/PulsarRecordCursor.java#L557)
to support `ROW`,`MAP`,`ARRAY` .etc
   
   so , I haved do a big reconsitution in my  Local Branch ,the main  change is
   
   - PulsarMetadata sociated with presto TypeManager 
   -  Deprecate `SchemaHandler` , migrate to `presto-record-decoder` with a bit of extension

   - decoupled pulsar-presto main module ( RecordSet,ConnectorMetadata .etc ) with `org.apache.avro.Schema`->
coupled with `org.apache.pulsar.common.schema.SchemaInfo `, aim to friendly with other schema
type ( `PB` 、`thrift` etc..)
   
   I accomplished this code and test on my local environment ,@sijie Is anyone else doing
same thing  ?
   
   
   ```
    presto> show create table pulsar."test-tenant/test-namespace".avroata;
   
    CREATE TABLE pulsar."test-tenant/test-namespace".avroata (
       name varchar COMMENT '["null","string"]',
       age integer COMMENT '"int"',
       childrens array(varchar) COMMENT '["null",{"type":"array","items":"string","java-class":"java.util.List"}]',
       teachers map(varchar, varchar) COMMENT '["null",{"type":"map","values":"string"}]',
       parent ROW(father varchar, mother varchar) COMMENT '["null",{"type":"record","name":"Parent","namespace":"com.hnail.pulsar.AvroGen"
       __partition__ integer COMMENT 'The partition number which the message belongs to',
       __event_time__ timestamp(3) COMMENT 'Application defined timestamp in milliseconds
of when the event occurred',
       __publish_time__ timestamp(3) COMMENT 'The timestamp in milliseconds of when event
as published',
       __message_id__ varchar COMMENT 'The message ID of the message used to generate this
row',
       __sequence_id__ bigint COMMENT 'The sequence ID of the message used to generate this
row',
       __producer_name__ varchar COMMENT 'The name of the producer that publish the message
used to generate this row',
       __key__ varchar COMMENT 'The partition key for the topic',
       __properties__ varchar COMMENT 'User defined properties'
    )
   (1 row)
   
   Query 20200826_083759_00000_neuwa, FINISHED, 1 node
   Splits: 1 total, 1 done (100.00%)
   9.18 [0 rows, 0B] [0 rows/s, 0B/s]
   
   presto> select * from pulsar."test-tenant/test-namespace".avroata limit 3;
      name   | age |     childrens     |                   teachers                   |  
           parent              | __partition__ |
   ----------+-----+-------------------+----------------------------------------------+----------------------------------+---------------+
    Student1 |  23 | [zhangsan, lisi]  | {yuwen=yuwen_value, shuxue=shuxue_value}     | {father=father1,
mother=mother1} |             2 |
    Student2 |  55 | [wangwu, fengliu] | {shuxue2=shuxue2_value, yuwen2=yuwen2_value} | {father=father2,
mother=mother2} |             2 |
    Student1 |  23 | [zhangsan, lisi]  | {yuwen=yuwen_value, shuxue=shuxue_value}     | {father=father1,
mother=mother1} |             0 |
   (3 rows)
   
   presto> select childrens[1],teachers['yuwen'] from pulsar."test-tenant/test-namespace".avroata
limit 1;
     _col0   |    _col1
   ----------+-------------
    zhangsan | yuwen_value
   (1 row)
   
   Query 20200826_114004_00004_kz734, FINISHED, 1 node
   
   Query 20200826_083759_00000_neuwa, FINISHED, 1 node
   Splits: 1 total, 1 done (100.00%)
   9.18 [0 rows, 0B] [0 rows/s, 0B/s]
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



Mime
View raw message