drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sachouche <...@git.apache.org>
Subject [GitHub] drill issue #1060: DRILL-5846: Improve parquet performance for Flat Data Typ...
Date Fri, 12 Jan 2018 14:39:14 GMT
Github user sachouche commented on the issue:

    https://github.com/apache/drill/pull/1060
  
    @paul-rogers with regard to the design aspects that you brought up:
    ** Corrections about the proposed Design **
    Your analysis somehow assumes the Vector is the one driving the loading operation. This
is not the case, a) it is the reader which invokes the bulk save API b) the Vector save method
runs a loop requesting the rest of the data. The reader can any time stop the loading phase;
currently the reader uses the batch size to make this decision. The plan is to enhance this
logic by having a feedback from the Vector save method about memory usage (e.g., save current
row set with the condition the memory usage doesn't get X bytes). I believe I have raised
this design decision early on and the agreement was to implement the memory batch restrictions
in a next Drill release.
    
    ** Agile Development **
    Through the years I came to appreciate the value of agile development. One needs to prioritize
design activities; it is completely valid to start with a design (which is not guaranteed
to be perfect) and then refine it overtime as long as the underlying implementation doesn't
spill to other modules (encapsulation). This has the merit of showcasing incremental improvements
and allowing the developer to get new insight as they have a better understanding.
       


---

Mime
View raw message