iotdb-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [iotdb] jt2594838 commented on a change in pull request #2859: Optimize cluster query
Date Wed, 17 Mar 2021 09:07:36 GMT

jt2594838 commented on a change in pull request #2859:
URL: https://github.com/apache/iotdb/pull/2859#discussion_r595823012



##########
File path: server/src/main/java/org/apache/iotdb/db/query/reader/chunk/DiskChunkReaderByTimestamp.java
##########
@@ -34,46 +34,49 @@
 
   private ChunkReaderByTimestamp chunkReaderByTimestamp;
   private BatchData data;
+  private long currentTime = Long.MIN_VALUE;
 
   public DiskChunkReaderByTimestamp(ChunkReaderByTimestamp chunkReaderByTimestamp) {
     this.chunkReaderByTimestamp = chunkReaderByTimestamp;
   }
 
   @Override
-  public Object getValueInTimestamp(long timestamp) throws IOException {
+  public Object[] getValuesInTimestamps(long[] timestamps, int length) throws IOException
{
+    Object[] result = new Object[length];
 
-    if (!hasNext()) {
-      return null;
-    }
-
-    while (data != null) {
-      Object value = data.getValueInTimestamp(timestamp);
-      if (value != null) {
-        return value;
+    for (int i = 0; i < length; i++) {
+      if (timestamps[i] < currentTime) {
+        throw new IOException("time must be increasing when use ReaderByTimestamp");
       }
-      if (data.hasCurrent()) {
-        return null;
-      } else {
-        chunkReaderByTimestamp.setCurrentTimestamp(timestamp);
-        if (chunkReaderByTimestamp.hasNextSatisfiedPage()) {
-          data = chunkReaderByTimestamp.nextPageData();
-        } else {
-          return null;
+      currentTime = timestamps[i];
+      while (hasNext()) {
+        data = next();
+        if (data.getMaxTimestamp() > currentTime) {
+          result[i] = null;
+          break;
+        }

Review comment:
       I think it should be `data.getMinTimestamp()` here.

##########
File path: server/src/main/java/org/apache/iotdb/db/query/aggregation/impl/LastValueDescAggrResult.java
##########
@@ -68,19 +68,29 @@ public void updateResultUsingTimestamps(
     if (hasFinalResult()) {
       return;
     }
-    long time = Long.MIN_VALUE;
-    Object lastVal = null;
     for (int i = 0; i < length; i++) {
-      Object value = dataReader.getValueInTimestamp(timestamps[i]);
-      if (value != null) {
-        time = timestamps[i];
-        lastVal = value;
-        break;
+      long[] oneTimestamp = new long[1];
+      oneTimestamp[0] = timestamps[i];
+      Object[] values = dataReader.getValuesInTimestamps(oneTimestamp, 1);
+      if (values[0] != null) {
+        timestamp = timestamps[i];
+        setValue(values[0]);
+        return;

Review comment:
       The same problem here as previously mentioned.

##########
File path: tsfile/src/main/java/org/apache/iotdb/tsfile/read/query/timegenerator/TimeGenerator.java
##########
@@ -50,18 +53,35 @@ public boolean hasNext() throws IOException {
   }
 
   public long next() throws IOException {
+    if (!hasOrNode) {
+      if (leafValuesCache == null) {
+        leafValuesCache = new HashMap<>();
+      }
+      leafNodeCache.forEach(
+          (path, nodes) ->
+              leafValuesCache
+                  .computeIfAbsent(path, k -> new ArrayList<>())
+                  .add(nodes.get(0).currentValue()));
+    }
     return operatorNode.next();
   }
 
-  public Object getValue(Path path, long time) {
-    for (LeafNode leafNode : leafCache.get(path)) {
-      if (!leafNode.currentTimeIs(time)) {
-        continue;
-      }
-      return leafNode.currentValue();
+  /** ATTENTION: this method should only be used when there is no `OR` node */
+  public Object[] getValues(Path path) throws IOException {
+    if (leafValuesCache.get(path) == null) {
+      throw new IOException(
+          "getValues() method should not be invoked by non-existent path in where clause");
     }
+    return leafValuesCache.remove(path).toArray();
+  }

Review comment:
       You may also check `hasOrNode` here for robustness.

##########
File path: server/src/main/java/org/apache/iotdb/db/query/dataset/RawQueryDataSetWithValueFilter.java
##########
@@ -64,102 +65,144 @@ public RawQueryDataSetWithValueFilter(
 
   @Override
   public boolean hasNextWithoutConstraint() throws IOException {
-    if (hasCachedRow) {
+    if (!cachedRowRecords.isEmpty()) {
       return true;
     }
-    return cacheRowRecord();
+    return cacheRowRecords();
   }
 
+  /** @return the first record of cached rows or null if there is no more data */
   @Override
   public RowRecord nextWithoutConstraint() throws IOException {
-    if (!hasCachedRow && !cacheRowRecord()) {
+    if (cachedRowRecords.isEmpty() && !cacheRowRecords()) {
       return null;
     }
-    hasCachedRow = false;
-    return cachedRowRecord;
+
+    return cachedRowRecords.remove(0);

Review comment:
       `remove(0)` of `ArrayList` is not very efficient. And considering `LinkedList` itself
is not fast, I would recommend using a heap and insert `RowRecords` into it during `cacheRowRecords`
 with a reversed order.

##########
File path: server/src/main/java/org/apache/iotdb/db/query/dataset/RawQueryDataSetWithValueFilter.java
##########
@@ -64,102 +65,144 @@ public RawQueryDataSetWithValueFilter(
 
   @Override
   public boolean hasNextWithoutConstraint() throws IOException {
-    if (hasCachedRow) {
+    if (!cachedRowRecords.isEmpty()) {
       return true;
     }
-    return cacheRowRecord();
+    return cacheRowRecords();
   }
 
+  /** @return the first record of cached rows or null if there is no more data */
   @Override
   public RowRecord nextWithoutConstraint() throws IOException {
-    if (!hasCachedRow && !cacheRowRecord()) {
+    if (cachedRowRecords.isEmpty() && !cacheRowRecords()) {
       return null;
     }
-    hasCachedRow = false;
-    return cachedRowRecord;
+
+    return cachedRowRecords.remove(0);
   }
 
   /**
-   * Cache row record
+   * Cache row records
    *
    * @return if there has next row record.
    */
-  private boolean cacheRowRecord() throws IOException {
-    while (timeGenerator.hasNext()) {
-      boolean hasField = false;
-      long timestamp = timeGenerator.next();
-      RowRecord rowRecord = new RowRecord(timestamp);
-
-      for (int i = 0; i < seriesReaderByTimestampList.size(); i++) {
-        Object value;
-        // get value from readers in time generator
-        if (cached.get(i)) {
-          value = timeGenerator.getValue(paths.get(i), timestamp);
-        } else {
-          // get value from series reader without filter
-          IReaderByTimestamp reader = seriesReaderByTimestampList.get(i);
-          value = reader.getValueInTimestamp(timestamp);
-        }
-        if (value == null) {
-          rowRecord.addField(null);
+  private boolean cacheRowRecords() throws IOException {
+    int cachedTimeCnt = 0;
+    long[] cachedTimeArray = new long[fetchSize];
+    // TODO: LIMIT constraint
+    // 1. fill time array from time Generator
+    while (timeGenerator.hasNext() && cachedTimeCnt < fetchSize) {
+      cachedTimeArray[cachedTimeCnt++] = timeGenerator.next();
+    }
+    if (cachedTimeCnt == 0) {
+      return false;
+    }
+    RowRecord[] rowRecords = new RowRecord[cachedTimeCnt];
+    for (int i = 0; i < cachedTimeCnt; i++) {
+      rowRecords[i] = new RowRecord(cachedTimeArray[i]);
+    }
+
+    boolean[] hasField = new boolean[cachedTimeCnt];
+    // 2. fetch results of each time series using time array
+    for (int i = 0; i < seriesReaderByTimestampList.size(); i++) {
+      Object[] results;
+      // get value from readers in time generator
+      if (cached.get(i)) {
+        results = timeGenerator.getValues(paths.get(i));
+      } else {
+        results =
+            seriesReaderByTimestampList
+                .get(i)
+                .getValuesInTimestamps(cachedTimeArray, cachedTimeCnt);
+      }
+
+      // 3. use values in results to fill row record
+      for (int j = 0; j < cachedTimeCnt; j++) {
+        if (results[j] == null) {
+          rowRecords[j].addField(null);
         } else {
-          hasField = true;
-          rowRecord.addField(value, dataTypes.get(i));
+          hasField[j] = true;
+          rowRecords[j].addField(results[j], dataTypes.get(i));
         }
       }
-      if (hasField) {
-        hasCachedRow = true;
-        cachedRowRecord = rowRecord;
-        break;
+    }
+    // 4. remove rowRecord if all values in one timestamp are null
+    for (int i = 0; i < cachedTimeCnt; i++) {
+      if (hasField[i]) {
+        cachedRowRecords.add(rowRecords[i]);
       }
     }
-    return hasCachedRow;
+
+    // 5. check whether there is next row record
+    return !cachedRowRecords.isEmpty();

Review comment:
       Consider the following case:
   SELECT s1 FROM root.d1 WHERE s2 >5
   the dataset is:
   time  s1     s2
   1       null   10
   2       null   10
   3       5       10
   and fetch size is 2.
   As fetch size is 2, timestamp 1 and 2 will be filled into `cachedTimeArray`, then you try
constructing `RowRecords` for these two timestamps, only to find both records has no fields,
so neither of them will be added into `cachedRowRecords`, then `cachedRowRecords.isEmpty`
will evaluate true, `hasNextWithoutConstraint` returns false to the upper level and therefore,
the last record, whose timestamp is 3, is missed.

##########
File path: server/src/main/java/org/apache/iotdb/db/query/aggregation/impl/FirstValueAggrResult.java
##########
@@ -94,11 +94,26 @@ public void updateResultUsingTimestamps(
     if (hasFinalResult()) {
       return;
     }
+    for (int i = 0; i < length; i++) {
+      long[] oneTimestamp = new long[1];
+      oneTimestamp[0] = timestamps[i];
+      Object[] values = dataReader.getValuesInTimestamps(oneTimestamp, 1);
+      if (values[0] != null) {
+        setValue(values[0]);
+        timestamp = timestamps[i];
+        break;
+      }
+    }

Review comment:
       This has a potential performance problem when there are many null values in the front.
For example, `timestamps.length == 1000` but the first 999 values are all null, then you will
have to call `dataReader.getValuesInTimestamps` for a thousand times, which may incur some
overhead.
   My suggestion would be to add an interface `getFirstValueInTimestamps` in `dataReader`
which returns the first non-null value that is in the give timestamps.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



Mime
View raw message