asterixdb-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Maxon <ima...@uci.edu>
Subject Re: Primary key in a nested document structure
Date Mon, 14 Sep 2015 22:46:27 GMT
Oh, I see, thank you for the clarification. It might be worthwhile to
try making an adapter if this doesn't fit the batch load use case
then...
 The single instance actually is super helpful.

Looking at the object, I think I see the general format. If this
intended to be an instance of GlobL4Type, then it doesn't fit the
schema above actually. It looks like it has 3 objects contained within
rather than 3 lists, like

create type GlobL4Type as open {
dimensions: type_dimensions
variables : type_variable
global_attributes: type_globatr
}

I also don't see a instance of the "id" field in the global_attributes
nested field. I see the UUID though, so I'll give an example of
creating a primary dataset on that with it as PK, and a more relaxed
type declaration (this'll only work on 0.8.7):

create type type_globattr as open {

title: string,

uuid: uuid

}

create type emptyType as open {}

create type GlobL4Type as open {

dimensions: emptyType,

variables: emptyType,

global_attributes: type_globattr

}



create dataset TestL4Dataset(GlobL4Type) primary key global_attributes.uuid;

I actually tried making a dataset like this, but I think the JSON
isn't quite proper. Everything is fine until line 11, where there's an
attribute like this:
...
"_FillValue":-32768s
...
I'm pretty sure this isn't valid JSON, or at least it doesn't pass
though JSONLint. Once that's fixed though hopefully it would load OK
with that schema, as it's not defining very much at all.

-Ian



On Mon, Sep 14, 2015 at 2:29 PM, Malarout, Namrata (398M-Affiliate)
<Namrata.Malarout@jpl.nasa.gov> wrote:
> I¹d like to point out that I haven¹t designed the schema by including all
> the fields in the file. We are yet to decide which ones we would like to
> specify in the schema. The feature to create an open type is useful in
> this case. I was giving a trial run to see if the ingestion would go
> smoothly.
>
> On 9/14/15, 2:01 PM, "Ian Maxon" <imaxon@uci.edu> wrote:
>
>>Hi Namrata,
>>First, I would say that the one feature it seems like you will need is
>>indexing on nested datatypes, which is only supported in the upcoming
>>release. That's coming very soon, maybe in the next week or so.
>>Therefore, you can either hold tight for the final release, or try it
>>now as version 0.8.7-SNAPSHOT
>>(https://asterixdb.incubator.apache.org/download.html). The nested
>>indexing is relatively final so I wouldn't expect any major changes
>>between that version and the release for this use case.
>>
>>Second, I'm a little confused on the formatting and layout of the
>>data. In AsterixDB, usually datasets are collections of instances of
>>JSON/ADM objects. Is what we have here a collection of GlobL4Type
>>objects? If you could subset and give an instance of one of the
>>rows/objects here, it'd be very helpful.
>>
>>Thanks!
>>- Ian
>>
>>On Mon, Sep 14, 2015 at 1:43 PM, Malarout, Namrata (398M-Affiliate)
>><Namrata.Malarout@jpl.nasa.gov> wrote:
>>> Hi all,
>>> The data I am working with has a nested structure. This is what my
>>>schema
>>> looks like:
>>>
>>>
>>> drop dataverse TestL4 if exists;
>>>
>>> create dataverse TestL4;
>>>
>>> use dataverse TestL4;
>>>
>>>
>>> create type type_dimensions as closed {
>>>
>>> time: int32,
>>>
>>> lat: int32,
>>>
>>> lon: int32
>>>
>>> }
>>>
>>> create type attributes_tll as open {
>>>
>>> long_name: string,
>>>
>>> standard_name: string,
>>>
>>> units: string,
>>>
>>> valid_min: float,
>>>
>>> valid_max: float,
>>>
>>> axis: string,
>>>
>>> comment: string
>>>
>>> }
>>>
>>> create type type_tll as open {
>>>
>>> typee: string,
>>>
>>> dimensions: {{string}},
>>>
>>> attributes: [attributes_tll]
>>>
>>> }
>>>
>>> create type type_globattr as open {
>>>
>>> title: string,
>>>
>>> id: string,
>>>
>>> uuid: string
>>>
>>> }
>>>
>>> create type type_var as open {
>>>
>>> time: type_tll,
>>>
>>> lat: type_tll,
>>>
>>> lon: type_tll
>>>
>>> }
>>>
>>> create type GlobL4Type as open {
>>>
>>> dimensions: [type_dimensions],
>>>
>>> variables: [type_var],
>>>
>>> global_attributes: [type_globattr]
>>>
>>> }
>>>
>>>
>>> Type GlobL4Type is the structure of the document. So I want to create a
>>> dataset based on it. I would like to use Œid¹ present in type_globattr
>>>as
>>> the primary key for every document. How can I do that?
>>> Thanks in advance for the help.
>>>
>>> Regards,
>>> Namrata Malarout
>

Mime
View raw message