spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From RongGu <...@git.apache.org>
Subject [GitHub] spark pull request: SPARK-1305: Support persisting RDD's directly ...
Date Fri, 28 Mar 2014 18:32:38 GMT
Github user RongGu commented on the pull request:

    https://github.com/apache/spark/pull/158#issuecomment-38953146
  
    Hey Rong,
    
    I just didn't know this was necessary in Tachyon. But if we keep it, yes
    let's just keep the number of directories at 64. The main issue was just
    the code complexity. It seemed a little ugly to have all this duplicated
    code from the DiskStore - I feel it might be possible to consolidate it
    more.
    
    But at this point I think it's okay to just put that as a TODO.
    
    - Patrick
    
    
    On Fri, Mar 28, 2014 at 11:29 AM, 顾荣 <gurongwalker@gmail.com> wrote:
    
    > Hi Patrick.
    >
    > Thank you for comments! The github web site is not accessible currently in
    > my location now. So, I have to send this email to discuss with you about my
    > latest update.
    >
    > In fact,  in my old version, the TachyonFilePathResolver interface along
    > with the getBlockLocation() are used by TachyonStore.getSize(blockId:
    > BlockId) to get the size of a block. The information is further used for
    > the stroage usage metrics in UI or something. I added this similar as the
    > DiskStore's PathResolver interface. However, as you suggested, to make the
    > code more concise, I have directly get the size from the tachyonFile now.
    > This way, we haved removed a lot of unnecessary codes here.
    >
    > As the subdirectories issue, I suggest you to keep it. Becuase  for some
    > large dataset, the block number on one executor can easily go up to
    > thousands even millions. I am afraid that in that time, we have to add this
    > piece of code again. Also, I aggree with haoyuan to set the number small
    > now. Thanks.
    >
    > Regards.
    > Rong Ru
    >
    >
    >
    > 2014-03-29 2:05 GMT+08:00 Patrick Wendell <notifications@github.com>:
    >
    >> @haoyuan <https://github.com/haoyuan> hey HY - the issue is mostly
    >> around keeping the complexity of the code minimal and avoiding a bunch of
    >> code duplication. The code around deleting this subdirectories is not
    >> trivial and right now the functions are just copy/pasted for Tachyon. I'll
    >> look at it a bit more...
    >>
    >> —
    >> Reply to this email directly or view it on GitHub<https://github.com/apache/spark/pull/158#issuecomment-38950091>
    >> .
    >>
    >
    >
    >
    > --
    > ------------------
    > Rong Gu
    > Department of Computer Science and Technology
    > State Key Laboratory for Novel Software Technology
    > Nanjing University
    > Phone: +86 15850682791
    > Email: gurongwalker@gmail.com
    >


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Mime
View raw message