quickstep-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From zuyu <...@git.apache.org>
Subject [GitHub] incubator-quickstep pull request #19: Improve text scan operator
Date Thu, 09 Jun 2016 15:20:11 GMT
Github user zuyu commented on a diff in the pull request:

    https://github.com/apache/incubator-quickstep/pull/19#discussion_r66461609
  
    --- Diff: relational_operators/TextScanOperator.cpp ---
    @@ -155,116 +63,50 @@ bool TextScanOperator::getAllWorkOrders(
       InsertDestination *output_destination =
           query_context->getInsertDestination(output_destination_index_);
     
    -  if (parallelize_load_) {
    -    // Parallel implementation: Split work orders are generated for each file
    -    // being bulk-loaded. (More than one file can be loaded, because we support
    -    // glob() semantics in file name.) These work orders read the input file,
    -    // and split them in the blobs that can be parsed independently.
    -    if (blocking_dependencies_met_) {
    -      if (!work_generated_) {
    -        // First, generate text-split work orders.
    -        for (const auto &file : files) {
    -          container->addNormalWorkOrder(
    -              new TextSplitWorkOrder(query_id_,
    -                                     file,
    -                                     process_escape_sequences_,
    -                                     storage_manager,
    -                                     op_index_,
    -                                     scheduler_client_id,
    -                                     bus),
    -              op_index_);
    -          ++num_split_work_orders_;
    -        }
    -        work_generated_ = true;
    -        return false;
    -      } else {
    -        // Check if there are blobs to parse.
    -        while (!text_blob_queue_.empty()) {
    -          const TextBlob blob_work = text_blob_queue_.popOne();
    -          container->addNormalWorkOrder(
    -              new TextScanWorkOrder(query_id_,
    -                                    blob_work.blob_id,
    -                                    blob_work.size,
    -                                    field_terminator_,
    -                                    process_escape_sequences_,
    -                                    output_destination,
    -                                    storage_manager),
    -              op_index_);
    -        }
    -        // Done if all split work orders are completed, and no blobs are left to
    -        // process.
    -        return num_done_split_work_orders_.load(std::memory_order_acquire) == num_split_work_orders_
&&
    -               text_blob_queue_.empty();
    -      }
    -    }
    -    return false;
    -  } else {
    -    // Serial implementation.
    -    if (blocking_dependencies_met_ && !work_generated_) {
    -      for (const auto &file : files) {
    +  // Text segment size set to 256KB.
    +  constexpr std::size_t kTextSegmentSize = 0x40000u;
    --- End diff --
    
    Cool. I would suggest to use a gflags to define this value, so that we could set any value
w/o recompilation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Mime
View raw message