asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Murtadha Hubail (Code Review)" <do-not-re...@asterix-gerrit.ics.uci.edu>
Subject Change in hyracks[master]: Add support for persistent local resources to IndexLifecycle...
Date Fri, 28 Aug 2015 21:53:15 GMT
Murtadha Hubail has posted comments on this change.

Change subject: Add support for persistent local resources to IndexLifecycleManager
......................................................................


Patch Set 2:

To answer your first point, here is some background behind the reason of this change.

a) A resource location (Name) is passed to IndexDataFlowHelper as an absolute path from a
FileSplit.
b) The current design of PersistentLocalResourceRepository assumes that the mapping information
between (Resource Name -> LocalResource) is already in memory, and all resources info is
loaded in memory during startup. 
c) We also need to keep a different mapping information between (ID -> LocalResource) since
the API of IIndexLifecycleManager (DatasetLifecycleManager in Asterix) is only using the resource
id, so if it needs to access any other info regarding this resource (e.g. Dataset ID), it
needs to use the (ID -> LocalResource) map. If we have a huge number of resources, all
of that info will be kept in memory all the time until a resource is destroyed.

I fixed this on the Asterix side of this change and made PersistentLocalResourceRepository
load resources on demand and there is a cache between (Resource Name -> LocalResource)
with a certain max size and got rid of the two mappings. But now DatasetLifecycleManager won’t
be able to access the resource using the ID only. That’s why I introduced this change here
to pass the name instead of the id. Using the Name, it will ask PersistentLocalResourceRepository
for the resource using its Name and it will be fetched from the cache or disk.

Another option I could’ve done is to pass the whole LocalResource instead of the name. This
way DatasetLifecycleManager wouldn’t need to fetch it again at all. But then I found that
we have a logic that is based on checking whether that resource actually exists or not.

Coming back to the rest of your points:
2) I could change the interface to use Object, but it will again break the existing code of
other clients.

3) If we all agree that String is the better identifier, I’m going to add deprecated to
the resource ID methods.

4) Clients do not need to make this check since they should know what type of ILocalResourceRepository
they are using. I added this check here because IndexDataflowHelper is a shared class between
our clients. New clients could create a new class which implements IIndexDataflowHelper that
doesn’t need to make this check. Hopefully we could implement point 3 and git rid of this
check as well. I also wanted to make this check based on instanceof instead of a boolean but
couldn’t due to projects dependancies.

Hope this answers your points.

-- 
To view, visit https://asterix-gerrit.ics.uci.edu/343
To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2e8e974fc2f746959639ce94351f8e419a7f9093
Gerrit-PatchSet: 2
Gerrit-Project: hyracks
Gerrit-Branch: master
Gerrit-Owner: Murtadha Hubail <hubailmor@gmail.com>
Gerrit-Reviewer: Ian Maxon <imaxon@apache.org>
Gerrit-Reviewer: Jenkins <jenkins@fulliautomatix.ics.uci.edu>
Gerrit-Reviewer: Murtadha Hubail <hubailmor@gmail.com>
Gerrit-Reviewer: Till Westmann <tillw@apache.org>
Gerrit-Reviewer: Yingyi Bu <yingyib@google.com>
Gerrit-Reviewer: Young-Seok Kim <kisskys@gmail.com>
Gerrit-Reviewer: abdullah alamoudi <bamousaa@gmail.com>
Gerrit-HasComments: No

Mime
View raw message