quickstep-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jianqiao <...@git.apache.org>
Subject [GitHub] incubator-quickstep pull request #19: Improve text scan operator
Date Thu, 09 Jun 2016 16:04:29 GMT
Github user jianqiao commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/19#discussion_r66470302
  
    --- Diff: relational_operators/TextScanOperator.cpp ---
    @@ -274,439 +116,293 @@ TextScanWorkOrder::TextScanWorkOrder(const std::size_t query_id,
     
     void TextScanWorkOrder::execute() {
       const CatalogRelationSchema &relation = output_destination_->getRelation();
    +  std::vector<Tuple> tuples;
     
    -  string current_row_string;
    -  if (is_file_) {
    -    FILE *file = std::fopen(filename_.c_str(), "r");
    -    if (file == nullptr) {
    -      throw TextScanReadError(filename_);
    -    }
    +  constexpr std::size_t kSmallBufferSize = 0x4000;
    --- End diff --
    
    This is the buffer size for processing the last row of the text segment.
    
    For each text segment, we will first: (1) start scanning from the first newline (`\n`)
character in the segment, and end scanning with the last newline character in the segment;
and then: (2) scanning from the _last_ newline character in _this_ text segment to the _first_
newline character in the _next_ text segment (corner cases will also be handled).
    
    Consider (2), how much data from the _next_ segment do we want to load from disk? Since
it is just one row, in most cases we may not want to load too much. So the load buffer starts
with 1024 bytes, and we keep appending the buffer's contents to a `std::string` if `\n` is
not met. If this "tail row" is really large, the buffer will grow up to 0x4000 bytes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Mime
View raw message