avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Busbey <bus...@cloudera.com>
Subject Re: Is it possible to use $ characters in field names?
Date Wed, 25 Oct 2017 16:05:12 GMT
+users@nifi.apache.org[1]

Could you can keep the data in Avro and then use Nifi's PutMongoRecord
processor[2] with an AvroReader to insert?


[1]: https://lists.apache.org/list.html?users@nifi.apache.org
[2]: https://s.apache.org/MmPG

On Wed, Oct 25, 2017 at 7:51 AM, Mike Thomsen <mikerthomsen@gmail.com>
wrote:

> No, it doesn't look like it's going to work. It accepts $date into the
> record using the alias, but it doesn't generate $date as the field name
> when writing the object back to JSON.
>
> On Wed, Oct 25, 2017 at 8:19 AM, Nandor Kollar <nkollar@cloudera.com>
> wrote:
>
>> Oh yes, you're right, you face with the limitation of field names
>> <https://avro.apache.org/docs/1.8.0/spec.html#names>. Apart from solving
>> this via a map, you might consider using Avro aliases
>> <https://avro.apache.org/docs/1.8.2/spec.html#Aliases>, since looks like
>> aliases don't have this limitation, can you use them?
>>
>> Nandor
>>
>> On Wed, Oct 25, 2017 at 1:40 PM, Mike Thomsen <mikerthomsen@gmail.com>
>> wrote:
>>
>>> Hi Nandor,
>>>
>>> It's not the numeric portion that is the problem for me, but the $date
>>> field name. Mongo apparently requires the structure I provided in the
>>> example, and whenever I use $date as the field name the Java Avro API
>>> throws an exception about an invalid character in the field definition.
>>>
>>> The logical type thing is good to know for future reference.
>>>
>>> I admit that this is likely a really uncommon edge case for Avro. The
>>> work around I found for defining a schema that is at least compatible with
>>> the Mongo Extended JSON requirements was to do this (one field example):
>>>
>>> {
>>>     "namespace": "test",
>>>     "name": "PutTestRecord",
>>>     "type": "record",
>>>     "fields": [{
>>>         "name": "timestampField",
>>>         "type": {
>>>             "type": "map",
>>>             "values": "long"
>>>         }
>>>     }]
>>> }
>>>
>>> It doesn't give you the full validation that would be ideal if we could
>>> define a field with the name "$date," but it's an 80% solution that works
>>> with NiFi and other tools that have to generate Extended JSON for Mongo.
>>>
>>> Thanks,
>>>
>>> Mike
>>>
>>> On Wed, Oct 25, 2017 at 4:48 AM, Nandor Kollar <nkollar@cloudera.com>
>>> wrote:
>>>
>>>> Hi Mike,
>>>>
>>>> This JSON doesn't seems like a valid Avro schema
>>>> <https://avro.apache.org/docs/1.8.1/spec.html#schemas>. If you'd like
>>>> to use timestamps in your schema, you should use Timestamp logical
>>>> types,
>>>> <https://avro.apache.org/docs/1.8.1/spec.html#Timestamp+%28millisecond+precision%29>
>>>> which annotate Avro longs. In this case the schema of this field should
>>>> look like this:
>>>>
>>>> {
>>>>    "name":"timestamp",
>>>>    "type":"long",
>>>>    "logicalType":"timestamp-millis"
>>>> }
>>>>
>>>> If you'd like to create Avro files with this schema, there's on Avro
>>>> wiki you can find a brief tutorial
>>>> <https://avro.apache.org/docs/1.8.1/gettingstartedjava.html#Compiling+the+schema>
>>>> how to create and write Avro files with this schema in Java.
>>>>
>>>> Regards,
>>>> Nandor
>>>>
>>>> On Tue, Oct 24, 2017 at 8:18 PM, Mike Thomsen <mikerthomsen@gmail.com>
>>>> wrote:
>>>>
>>>>> I am trying to build an avro schema for a NiFi flow that is going to
>>>>> insert data into Mongo, and Mongo extended JSON requires the use of $
>>>>> characters in cases like this (to represent a date):
>>>>>
>>>>> {
>>>>>     "timestamp": {
>>>>>         "$date": TIMESTAMP_LONG_HERE
>>>>>     }
>>>>> }
>>>>>
>>>>> I tried building a schema with that, and it failed saying there was an
>>>>> invalid character in the schema.  just wanted to check and see if there
was
>>>>> a work around for this or if I'll have to choose another option.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Mike
>>>>>
>>>>
>>>>
>>>
>>
>


-- 
busbey

Mime
View raw message