asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Till Westmann" <ti...@apache.org>
Subject Re: Yet another storage cleanup and a bit of history
Date Thu, 11 May 2017 20:13:09 GMT
I think that the point is, that the gerrit/jenkins admin rights are not
necessarily aligned with the committer rights. And so storing and 
changing
binaries in git is something everybody can do, while overriding in 
gerrit is
not (as least I think that’s the way it is).
So I think that we should either change the test or the ability to 
override.

Cheers,
Till

On 11 May 2017, at 12:30, Ian Maxon wrote:

> I'm not sure I follow, you'd still need permission to change the
> dataset files. Unless you mean to commit them to git? However I don't
> think thats an option either because even at the trivial size they are
> now (2 pages) that's huge git-wise.
>
> On Thu, May 11, 2017 at 8:41 AM, abdullah alamoudi 
> <bamousaa@gmail.com> wrote:
>> It will not require special access to jenkins...
>>
>>
>>> On May 10, 2017, at 8:06 PM, Ian Maxon <imaxon@uci.edu> wrote:
>>>
>>> I don't really see how that would require any less intervention than
>>> the current job. As it stands, once it's decided that breaking the
>>> storage for a patch is fine, all we need to do is override the -1
>>> Verified and merge it. What it checks against is HEAD and HEAD~1 of
>>> the patch, not the release version.
>>>
>>> On Wed, May 10, 2017 at 7:59 PM, abdullah alamoudi 
>>> <bamousaa@gmail.com <mailto:bamousaa@gmail.com>> wrote:
>>>> A follow up on this. So this change has been through a round of 
>>>> review and will be merged soon. Currently, it fails the storage 
>>>> check test. The storage check test works as follow:
>>>>
>>>> It builds some artifacts using the previous asterixdb version and 
>>>> then tries to read them using the build with the change. The test 
>>>> itself is good as it catches those changes that modifies the 
>>>> storage files but not very convenient as one has to have access to 
>>>> jenkins to override them.
>>>> What I propose is to disable this test and have binary data files 
>>>> with a test that reads them. If a change changes any storage 
>>>> related classes, then it will still fail those tests until the 
>>>> files are updated. At which point, the reviewer should catch that 
>>>> and if it is a legitimate change, then it should be allowed in.
>>>>
>>>> Eventually, we should have backward compatibility storage and/or a 
>>>> migration facility and then maybe we can put those back on jenkins.
>>>>
>>>> Thoughts? Proposals?
>>>>
>>>>
>>>>> On May 8, 2017, at 10:59 PM, abdullah alamoudi 
>>>>> <bamousaa@gmail.com> wrote:
>>>>>
>>>>> Devs,
>>>>> For some time, out storage code has been suffering from 
>>>>> incremental design changes through additional use cases that come 
>>>>> along research projects.
>>>>> We have done some cleanup but the code base still suffered from 
>>>>> lots of duplicate code and unneeded work (for both developers and 
>>>>> CPUs).
>>>>>
>>>>> One thing we used to do is whenever we need to access an index, we 
>>>>> have to create its "IDataflowHelperFactory". This object will 
>>>>> contain most things that defines the index. Interestingly, it 
>>>>> didn't contain all that is needed.
>>>>> For some obscure reason, typeTraits, comparatorFactories, and 
>>>>> bloomFilterKeyFields were places in 
>>>>> TreeIndexCreateOperatorDescriptor.
>>>>> Different indexes had different dataflow helpers and so sometimes, 
>>>>> multiple class definitions are needed for operators per index 
>>>>> type. and this leads to further bloat of the system's code.
>>>>>
>>>>> If one thinks about it, the index related objects that are needed 
>>>>> during index construction are only needed when the index gets 
>>>>> created. Further needs to access the index for any reason should 
>>>>> only gets the index key (the path in this case).
>>>>> In fact, if one follows the execution of the code, they will see 
>>>>> that that is exactly what is needed and that whenever we try to 
>>>>> search/insert/delete/upsert/bulkload, we recompute many artifacts 
>>>>> that end up being useless.
>>>>>
>>>>> So, I proposed a change that fixes this part. Index related 
>>>>> objects are only provided at index creation time and for index 
>>>>> access, only the index path is required. This is done by removing 
>>>>> the create method from the IIndexDataflowHelper interface and 
>>>>> moving it to IIndexBuilder.
>>>>> With this, all implementations of IIndexDataflowHelper are now in 
>>>>> a single class that basically uses the path to fetch the index on 
>>>>> a Node Controller.
>>>>>
>>>>> This change gets rid of more than 3000 lines of code and makes 
>>>>> things much cleaner. Classes that should go to hyracks are moved 
>>>>> to hyracks. Asterix related information such as dataset id and 
>>>>> partition number are kept in asterix through the introduction of 
>>>>> DatasetLocalResource. To show the effect of this change, you can 
>>>>> look at an example in 
>>>>> https://asterix-gerrit.ics.uci.edu/#/c/1728/3/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/algebra/operators/physical/InvertedIndexPOperator.java

>>>>> <https://asterix-gerrit.ics.uci.edu/#/c/1728/3/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/algebra/operators/physical/InvertedIndexPOperator.java>

>>>>> <https://asterix-gerrit.ics.uci.edu/#/c/1728/3/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/algebra/operators/physical/InvertedIndexPOperator.java

>>>>> <https://asterix-gerrit.ics.uci.edu/#/c/1728/3/asterixdb/asterix-algebra/src/main/java/org/apache/asterix/algebra/operators/physical/InvertedIndexPOperator.java>>
>>>>>
>>>>> Look at the amount of work that was unneeded to access an inverted 
>>>>> index. The only thing that was actually needed is the index 
>>>>> FileSplitProvider. Take another look at 
>>>>> https://asterix-gerrit.ics.uci.edu/#/c/1728/3/asterixdb/asterix-metadata/src/main/java/org/apache/asterix/metadata/declared/MetadataProvider.java

>>>>> <https://asterix-gerrit.ics.uci.edu/#/c/1728/3/asterixdb/asterix-metadata/src/main/java/org/apache/asterix/metadata/declared/MetadataProvider.java>

>>>>> <https://asterix-gerrit.ics.uci.edu/#/c/1728/3/asterixdb/asterix-metadata/src/main/java/org/apache/asterix/metadata/declared/MetadataProvider.java

>>>>> <https://asterix-gerrit.ics.uci.edu/#/c/1728/3/asterixdb/asterix-metadata/src/main/java/org/apache/asterix/metadata/declared/MetadataProvider.java>>

>>>>> and see how much unneeded code gets removed. This is not enough on 
>>>>> the asterixdb side but it creates a good foundation that would 
>>>>> allow existing and new code to get cleaner.
>>>>>
>>>>> Please, take a look at the change 
>>>>> https://asterix-gerrit.ics.uci.edu/#/c/1728/ 
>>>>> <https://asterix-gerrit.ics.uci.edu/#/c/1728/> 
>>>>> <https://asterix-gerrit.ics.uci.edu/#/c/1728/ 
>>>>> <https://asterix-gerrit.ics.uci.edu/#/c/1728/>> if interested
and 
>>>>> let me know if you have any comments. Note that it fails storage 
>>>>> check test and that is expected because it changes the persisted 
>>>>> resource info classes.
>>>>>
>>>>> Cheers,
>>>>> Abdullah.
>>

Mime
View raw message