asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Maxon <ima...@uci.edu>
Subject Re: [jira] [Commented] (ASTERIXDB-1201) RTree built on the optional field refuses to load the NULL value when executing the bulk load
Date Tue, 01 Dec 2015 23:58:38 GMT
I don't think we know what we do with NULL values in an RTree in
general though. After briefly discussing this with Mike, Till, Yingyi,
and Abdullah this morning, the takeaway (as I understood it) was a
concern with whether or not the RTree is a secondary index, or
something like a partial index where the attribute isn't null. I ran
through Jianfeng's test code briefly in the debugger to see what's
going on.

>From the bulk load case, at least, I think the latter is true. What I
did to check this was just to pretty print every tuple that got fed
into the RTree and BTree bulk loaders.

For the secondary BTree we can see we insert the key anyway regardless
of the presence of the attribute:

TC: 1
 tid0:(1, 19)[f0:(0, 9) {AInt64: {3}}f1:(9, 10) {null}

TC: 2
 tid0:(1, 19)[f0:(0, 9) {AInt64: {1}}f1:(9, 10) {null}
 tid1:(19, 45)[f0:(0, 9) {AInt64: {2}}f1:(9, 18) {AInt64: {3}}

But for the RTree this is all we insert:

tid0:(1, 66)[f0:(0, 9) {AInt64: {3}}f1:(9, 18) {ADouble: {4.0}}f2:(18,
27) {ADouble: {5.0}}f3:(27, 36) {ADouble: {4.0}}f4:(36, 45) {ADouble:
{5.0}}

This makes me wonder what was up with the original RTree bulkload code
to make this happen, because that looks like a perfectly fine tuple to
me...

- Ian

On Tue, Dec 1, 2015 at 2:40 PM, Ildar Absalyamov
<ildar.absalyamov@gmail.com> wrote:
> As far as I can see the patch I was working on have not been merged into master yet.
So unless Jianfeng was working off release-0.8.8 branch it should not be the cause.
>
>> On Dec 1, 2015, at 10:01, Chen Li <chenli@gmail.com> wrote:
>>
>> Maybe Ildar?
>>
>> On Mon, Nov 30, 2015 at 4:05 PM, Jianfeng Jia (JIRA) <jira@apache.org>
>> wrote:
>>
>>>
>>>    [
>>> https://issues.apache.org/jira/browse/ASTERIXDB-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15032758#comment-15032758
>>> ]
>>>
>>> Jianfeng Jia commented on ASTERIXDB-1201:
>>> -----------------------------------------
>>>
>>> Hi devs,
>>>
>>> I submitted an issue 1201 which happens on the master
>>> (48706305724f6e2580b5a6716a709cebce2b40c0). But it’s not reproducible in
>>> the the latest master.
>>> Basically, it built an RTree index on an nullable field. It was complain
>>> about the NULL values in the older version.
>>> I’m wondering if anyone fix this problem intentionally. If so, what’s the
>>> meaning of NULL as to the RTree index?
>>>
>>>
>>>
>>>
>>>
>>> Best,
>>>
>>> Jianfeng Jia
>>> PhD Candidate of Computer Science
>>> University of California, Irvine
>>>
>>>
>>>
>>>> RTree built on the optional field refuses to load the NULL value when
>>> executing the bulk load
>>>>
>>> ---------------------------------------------------------------------------------------------
>>>>
>>>>                Key: ASTERIXDB-1201
>>>>                URL:
>>> https://issues.apache.org/jira/browse/ASTERIXDB-1201
>>>>            Project: Apache AsterixDB
>>>>         Issue Type: Bug
>>>>         Components: Storage
>>>>           Reporter: Jianfeng Jia
>>>>           Assignee: Ian Maxon
>>>>
>>>> When I build a RTree index on an optional field, it will throw "Value
>>> provider for type NULL is not implemented" exception when operates the bulk
>>> load.
>>>> Here is the reproducible script:
>>>> {code}
>>>> drop dataverse test if exists;
>>>> create dataverse test;
>>>> use dataverse test;
>>>> create type t_record as closed {
>>>> fa : int64,
>>>> fb: int64?,
>>>> fc : point?
>>>> }
>>>> create dataset ds_set (t_record) primary key fa;
>>>> create index bidx on ds_set(fb) type btree;
>>>> create index cidx on ds_set(fc) type rtree;
>>>> insert into dataset ds_set ( [{"fa":1}, {"fa":2, "fb":3}, {"fa":3,
>>> "fc":point("4.0,5.0")}]);
>>>> load dataset ds_set
>>>> using localfs
>>>> (("path"="172.17.0.2:///data/twitter/test.adm"),("format"="adm"));
>>>> {code}
>>>> The "insert" and "load" statements are run separately.
>>>> The test.adm uses the same three records:
>>>> {code}
>>>> {"fa":1}
>>>> {"fa":2, "fb":3}
>>>> {"fa":3, "fc":point("4.0,5.0")
>>>> {code}
>>>> The insert statement works fine. The error happens in the "load"
>>> statement only:
>>>> {code}
>>>> Caused by:
>>> org.apache.hyracks.algebricks.common.exceptions.NotImplementedException:
>>> Value provider for type NULL is not implemented
>>>>  at
>>> org.apache.asterix.dataflow.data.nontagged.valueproviders.AqlPrimitiveValueProviderFactory$1.getValue(AqlPrimitiveValueProviderFactory.java:64)
>>>>  at
>>> org.apache.hyracks.storage.am.rtree.frames.RTreeNSMFrame.adjustMBRImpl(RTreeNSMFrame.java:132)
>>>>   at
>>> org.apache.hyracks.storage.am.rtree.frames.RTreeNSMFrame.adjustMBR(RTreeNSMFrame.java:153)
>>>>   at
>>> org.apache.hyracks.storage.am.rtree.impls.RTree$RTreeBulkLoader.propagateBulk(RTree.java:954)
>>>>   at
>>> org.apache.hyracks.storage.am.rtree.impls.RTree$RTreeBulkLoader.end(RTree.java:937)
>>>>   at
>>> org.apache.hyracks.storage.am.lsm.rtree.impls.LSMRTree$LSMRTreeBulkLoader.end(LSMRTree.java:584)
>>>>   at
>>> org.apache.hyracks.storage.am.common.dataflow.IndexBulkLoadOperatorNodePushable.close(IndexBulkLoadOperatorNodePushable.java:107)
>>>>   ... 7 more
>>>> {code}
>>>> The BTree index works fine if I remove the RTree index.
>>>
>>>
>>>
>>> --
>>> This message was sent by Atlassian JIRA
>>> (v6.3.4#6332)
>>>
>
> Best regards,
> Ildar
>

Mime
View raw message