gobblin-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Prateek Gupta <prateek.gup...@myntra.com>
Subject Re: SchemaParseException When Writing to ORC File
Date Wed, 22 Nov 2017 04:54:47 GMT
Hi Team,

Yes, the issue is exactly as mentioned by Zhixiong.

Since, we're using the Schema Registry, we don't have access to the 'avsc'
file.
And also, the usage of 'avro.schema.literal' will not serve the purpose in
case of *schema evolution*.

Thanks,
Prateek Gupta

On Wed, Nov 22, 2017 at 12:04 AM, Zhixiong Chen <zhchen@linkedin.com> wrote:

> Hi Tamas,
>
>
> I'm not quite sure what issue Prateek has. Is it setting `avro.schema.url`
> to a schema registry url breaks `HiveSerDeConverter` because it doesn't
> return the schema but a schema wrapper?
>
>
> Based on my investigation, `avro.schema.url` is usually a path to the an
> avsc file. I guess naming it as some url creates confusion. It doesn't seem
> to be used as a schema registry url. Alternatively, `avro.schema.literal`
> can be used. Assign it to be the content of an avsc file or the actual
> schema.
>
>
> Speaking of schema registry support, I do think it's useful. We have
> something similar in the existing codebase, check out `
> org.apache.gobblin.metrics.kafka.KafkaAvroEventReporter`. However, the
> idea is not implemented for general availability. There are some related
> constructs placed in "wrong" modules. We may start with new constructs
> which deprecate those.
>
>
> Zhixiong
>
>
> ------------------------------
> *From:* Tamas Nemeth <tamas.nemeth@prezi.com>
> *Sent:* Tuesday, November 21, 2017 6:02:41 AM
> *To:* user@gobblin.incubator.apache.org
> *Cc:* Engg_data_ingestion
> *Subject:* Re: SchemaParseException When Writing to ORC File
>
> Hey Zhixiong,
>
> The issue here is Prateek tried set for the schema url the Confluent
> schema registry endpoint address which sends back a message where the
> actual schema is wrapped under the schema property  ->
> {"subject":"localhost.demo.Demo-value","version":1,"id":
> 4,"schema":"{\"type\":\"record\",\"name\":\"Value\",\"namesp
> ace\":\"localhost.demo.Demo\",\"fields\":[{\"name\":\"id\",\
> "type\":\"int\"},{\"name\":\"Name\",\"type\":[\"null\",\"
> string\"],\"default\":null},{\"name\":\"Age\",\"type\":[\"
> null\",{\"type\":\"int\",\"connect.type\":\"int16\"}],\"default\":null},{\
> "name\":\"Department\",\"type\":[\"null\",\"string\"],\"default\":
> null}],\"connect.name\":\"localhost.demo.Demo.Value\"}"}
> connect.name&nbsp;-&nbsp;This website is for sale!&nbsp;-&nbsp;Connect
> Resources and Information. <http://connect.name/>
> connect.name
> This website is for sale! connect.name is your first and best source for
> all of the information you’re looking for. From general topics to more of
> what you would expect to find here, connect.name has it all. We hope you
> find what you are searching for!
>
> Do you think it would make sense to support schema registries (or
> Confluent schema registry at least as it is quite popular nowadays)
> wherever avro schema url can be set? I think it would make. What do you
> think?
> How you use in your environment? Do you have endpoint in your schema
> registry which reply only with the actual schema?
>
> Thanks,
> Tamas
>
> On Tue, Nov 21, 2017 at 1:28 AM Zhixiong Chen <zhchen@linkedin.com> wrote:
>
> Hi Prateek,
>
>
> Per the suggestion from Tamas, the direct response is not the schema but
> it contains a schema field which has the schema as a json string. Is that
> the schema you're looking for?
>
>
> Or, you're actually saying, the entire response is used for writing an
> avro file?
>
>
> Zhixiong
> ------------------------------
> *From:* Prateek Gupta <prateek.gupta3@myntra.com>
> *Sent:* Thursday, November 16, 2017 11:49:53 PM
> *To:* user@gobblin.incubator.apache.org
> *Cc:* Engg_data_ingestion
> *Subject:* Re: SchemaParseException When Writing to ORC File
>
> Hi Tamas,
>
> Thanks for the response.
>
> The same schema is utilised for writing an Avro file also.
> Since, the schema is registered with Schema Registry, the Avro message
> does not have the schema, but a global identifier.
>
> PFB, the endpoint used.
>
> *http://localhost:8081/subjects/localhost.demo.Demo-value/versions/1
> <http://localhost:8081/subjects/localhost.demo.Demo-value/versions/1>*
>
> *Regards,*
> *Prateek Gupta*
>
> On Fri, Nov 17, 2017 at 12:53 PM, Tamas Nemeth <tamas.nemeth@prezi.com>
> wrote:
>
> Hey Prateek,
>
> I think the problem here is that the Schema what you get from the Schema
> registry is not just the Avro Schema. If you check the Schema in your
> message the actual Schema is in the schema property.
>
> Does Confluent schema registry have an endpoint where you can get back the
> schema only?
>
> Thanks,
> Tamas
>
>
>
> On 2017. Nov 17., Fri at 7:11, Prateek Gupta <prateek.gupta3@myntra.com>
> wrote:
>
> Hi,
>
> Please aid in resolving the aforementioned issue.
>
> Regards,
> Prateek Gupta
>
> On Wed, Nov 15, 2017 at 2:44 PM, Prateek Gupta <prateek.gupta3@myntra.com>
> wrote:
>
> Hi,
>
> As per the documentation, Writing to an ORC File
> <https://gobblin.readthedocs.io/en/latest/case-studies/Writing-ORC-Data/#writing-to-an-orc-file>,
> "In order to configure the HiveSerDeConverter *avro.schema.url* must be
> set when using this deserializer so that the Hive SerDe knows what Avro
> Schema to use when converting the record."
>
> If the URL is set to a Confluent Schema Registry (used for storing and
> retrieving Avro schemas) address, it fails with below exception.
>
> *org.apache.avro.SchemaParseException: No type: *{"subject":"localhost.
> demo.Demo-value","version":1,"id":4,"schema":"{\"type\":\"
> record\",\"name\":\"Value\",\"namespace\":\"localhost.demo.
> Demo\",\"fields\":[{\"name\":\"id\",\"type\":\"int\"},{\"
> name\":\"Name\",\"type\":[\"null\",\"string\"],\"default\"
> :null},{\"name\":\"Age\",\"type\":[\"null\",{\"type\":\"
> int\",\"connect.type\":\"int16\"}],\"default\":null},{\
> "name\":\"Department\",\"type\":[\"null\",\"string\"],\"
> default\":null}],\"connect.name\":\"localhost.demo.Demo.Value\"}"}
>
> Please provide assistance in resolution for the same.
>
> Regards,
> Prateek Gupta
>
>
>
>

Mime
View raw message