asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Maxon <>
Subject Re: Yet another storage cleanup and a bit of history
Date Thu, 11 May 2017 20:27:04 GMT
I'm fine with changing the ability to override as it's recorded as
part of the commit. I don't see any reason it can't be tied to
committer privs.

On Thu, May 11, 2017 at 1:13 PM, Till Westmann <> wrote:
> I think that the point is, that the gerrit/jenkins admin rights are not
> necessarily aligned with the committer rights. And so storing and changing
> binaries in git is something everybody can do, while overriding in gerrit is
> not (as least I think that’s the way it is).
> So I think that we should either change the test or the ability to override.
> Cheers,
> Till
> On 11 May 2017, at 12:30, Ian Maxon wrote:
>> I'm not sure I follow, you'd still need permission to change the
>> dataset files. Unless you mean to commit them to git? However I don't
>> think thats an option either because even at the trivial size they are
>> now (2 pages) that's huge git-wise.
>> On Thu, May 11, 2017 at 8:41 AM, abdullah alamoudi <>
>> wrote:
>>> It will not require special access to jenkins...
>>>> On May 10, 2017, at 8:06 PM, Ian Maxon <> wrote:
>>>> I don't really see how that would require any less intervention than
>>>> the current job. As it stands, once it's decided that breaking the
>>>> storage for a patch is fine, all we need to do is override the -1
>>>> Verified and merge it. What it checks against is HEAD and HEAD~1 of
>>>> the patch, not the release version.
>>>> On Wed, May 10, 2017 at 7:59 PM, abdullah alamoudi <
>>>> <>> wrote:
>>>>> A follow up on this. So this change has been through a round of review
>>>>> and will be merged soon. Currently, it fails the storage check test.
>>>>> storage check test works as follow:
>>>>> It builds some artifacts using the previous asterixdb version and then
>>>>> tries to read them using the build with the change. The test itself is
>>>>> as it catches those changes that modifies the storage files but not very
>>>>> convenient as one has to have access to jenkins to override them.
>>>>> What I propose is to disable this test and have binary data files with
>>>>> a test that reads them. If a change changes any storage related classes,
>>>>> then it will still fail those tests until the files are updated. At which
>>>>> point, the reviewer should catch that and if it is a legitimate change,
>>>>> it should be allowed in.
>>>>> Eventually, we should have backward compatibility storage and/or a
>>>>> migration facility and then maybe we can put those back on jenkins.
>>>>> Thoughts? Proposals?
>>>>>> On May 8, 2017, at 10:59 PM, abdullah alamoudi <>
>>>>>> wrote:
>>>>>> Devs,
>>>>>> For some time, out storage code has been suffering from incremental
>>>>>> design changes through additional use cases that come along research
>>>>>> projects.
>>>>>> We have done some cleanup but the code base still suffered from lots
>>>>>> of duplicate code and unneeded work (for both developers and CPUs).
>>>>>> One thing we used to do is whenever we need to access an index, we
>>>>>> have to create its "IDataflowHelperFactory". This object will contain
>>>>>> things that defines the index. Interestingly, it didn't contain all
that is
>>>>>> needed.
>>>>>> For some obscure reason, typeTraits, comparatorFactories, and
>>>>>> bloomFilterKeyFields were places in TreeIndexCreateOperatorDescriptor.
>>>>>> Different indexes had different dataflow helpers and so sometimes,
>>>>>> multiple class definitions are needed for operators per index type.
and this
>>>>>> leads to further bloat of the system's code.
>>>>>> If one thinks about it, the index related objects that are needed
>>>>>> during index construction are only needed when the index gets created.
>>>>>> Further needs to access the index for any reason should only gets
the index
>>>>>> key (the path in this case).
>>>>>> In fact, if one follows the execution of the code, they will see
>>>>>> that is exactly what is needed and that whenever we try to
>>>>>> search/insert/delete/upsert/bulkload, we recompute many artifacts
that end
>>>>>> up being useless.
>>>>>> So, I proposed a change that fixes this part. Index related objects
>>>>>> are only provided at index creation time and for index access, only
>>>>>> index path is required. This is done by removing the create method
from the
>>>>>> IIndexDataflowHelper interface and moving it to IIndexBuilder.
>>>>>> With this, all implementations of IIndexDataflowHelper are now in
>>>>>> single class that basically uses the path to fetch the index on a
>>>>>> Controller.
>>>>>> This change gets rid of more than 3000 lines of code and makes things
>>>>>> much cleaner. Classes that should go to hyracks are moved to hyracks.
>>>>>> Asterix related information such as dataset id and partition number
are kept
>>>>>> in asterix through the introduction of DatasetLocalResource. To show
>>>>>> effect of this change, you can look at an example in
>>>>>> <>
>>>>>> <
>>>>>> <>>
>>>>>> Look at the amount of work that was unneeded to access an inverted
>>>>>> index. The only thing that was actually needed is the index
>>>>>> FileSplitProvider. Take another look at
>>>>>> <>
>>>>>> <
>>>>>> <>>
>>>>>> and see how much unneeded code gets removed. This is not enough on
>>>>>> asterixdb side but it creates a good foundation that would allow
>>>>>> and new code to get cleaner.
>>>>>> Please, take a look at the change
>>>>>> <>
>>>>>> <
>>>>>> <>> if interested
and let me
>>>>>> know if you have any comments. Note that it fails storage check test
>>>>>> that is expected because it changes the persisted resource info classes.
>>>>>> Cheers,
>>>>>> Abdullah.

View raw message