gobblin-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zhixiong Chen <zhc...@linkedin.com>
Subject Re: SchemaParseException When Writing to ORC File
Date Tue, 21 Nov 2017 00:27:58 GMT
Hi Prateek,

Per the suggestion from Tamas, the direct response is not the schema but it contains a schema
field which has the schema as a json string. Is that the schema you're looking for?

Or, you're actually saying, the entire response is used for writing an avro file?


From: Prateek Gupta <prateek.gupta3@myntra.com>
Sent: Thursday, November 16, 2017 11:49:53 PM
To: user@gobblin.incubator.apache.org
Cc: Engg_data_ingestion
Subject: Re: SchemaParseException When Writing to ORC File

Hi Tamas,

Thanks for the response.

The same schema is utilised for writing an Avro file also.
Since, the schema is registered with Schema Registry, the Avro message does not have the schema,
but a global identifier.

PFB, the endpoint used.


Prateek Gupta

On Fri, Nov 17, 2017 at 12:53 PM, Tamas Nemeth <tamas.nemeth@prezi.com<mailto:tamas.nemeth@prezi.com>>
Hey Prateek,

I think the problem here is that the Schema what you get from the Schema registry is not just
the Avro Schema. If you check the Schema in your message the actual Schema is in the schema

Does Confluent schema registry have an endpoint where you can get back the schema only?


On 2017. Nov 17., Fri at 7:11, Prateek Gupta <prateek.gupta3@myntra.com<mailto:prateek.gupta3@myntra.com>>

Please aid in resolving the aforementioned issue.

Prateek Gupta

On Wed, Nov 15, 2017 at 2:44 PM, Prateek Gupta <prateek.gupta3@myntra.com<mailto:prateek.gupta3@myntra.com>>

As per the documentation, Writing to an ORC File<https://gobblin.readthedocs.io/en/latest/case-studies/Writing-ORC-Data/#writing-to-an-orc-file>,
"In order to configure the HiveSerDeConverter avro.schema.url must be set when using this
deserializer so that the Hive SerDe knows what Avro Schema to use when converting the record."

If the URL is set to a Confluent Schema Registry (used for storing and retrieving Avro schemas)
address, it fails with below exception.

org.apache.avro.SchemaParseException: No type: {"subject":"localhost.demo.Demo-value","version":1,"id":4,"schema":"{\"type\":\"record\",\"name\":\"Value\",\"namespace\":\"localhost.demo.Demo\",\"fields\":[{\"name\":\"id\",\"type\":\"int\"},{\"name\":\"Name\",\"type\":[\"null\",\"string\"],\"default\":null},{\"name\":\"Age\",\"type\":[\"null\",{\"type\":\"int\",\"connect.type\":\"int16\"}],\"default\":null},{\"name\":\"Department\",\"type\":[\"null\",\"string\"],\"default\":null}],\"connect.name<http://connect.name/>\":\"localhost.demo.Demo.Value\"}"}

Please provide assistance in resolution for the same.

Prateek Gupta

View raw message