hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Vary <pv...@cloudera.com>
Subject Re: Review Request 57693: HIVE-16146 If possible find a better way to filter the TestBeeLineDriver outpu
Date Fri, 17 Mar 2017 12:38:48 GMT


> On March 17, 2017, 12:37 a.m., Zoltan Haindrich wrote:
> > itests/util/src/main/java/org/apache/hive/beeline/qfile/QFile.java
> > Lines 66-73 (original)
> > <https://reviews.apache.org/r/57693/diff/1/?file=1666199#file1666199line66>
> >
> >     where did these methods gone?

They are not used, and I did not foresee any usage of them, since the QFile runs the diff
itself, but I can be convinced :)


> On March 17, 2017, 12:37 a.m., Zoltan Haindrich wrote:
> > itests/util/src/main/java/org/apache/hive/beeline/qfile/QFile.java
> > Lines 78 (patched)
> > <https://reviews.apache.org/r/57693/diff/1/?file=1666199#file1666199line87>
> >
> >     I've not yet clearly understand the goal here...but this "take every second
element" seems kinda odd..
> >     
> >     would the following query's output correct?
> >     echo -e "select 'a\x00b';\nselect 1;" > asd.q
> >     
> >     anyway...I think it would be better to separate the different beeline outputs
somehow instead of trying to workaround it externally...

Hi Zoltan,

"take every second element" is *even* not *odd* :) :)

Back to the topic:
I think, the final goal for the BeeLine tests would be the following:
- Run them against a real cluster
- Run them parallel to decrease the runtime
- Use the same .q and .q.out files as the CLI tests.

Do we agree in the above points? Do you have any extra points we should consider? If you have
any thoughts, I would be happy to add!

The current .q.out files contain data from 2 sources:
- Server side - Data from the Pre/PostExecuteHooks
- Client side - Query result data

Since the data is from 2 sources, it could be only collected at the client side effectively.
I considered the following options:
1. Generate only the required logs. For example add new OperationLog.LoggingLevel, and modify
LogDivertAppender, so only the relevant logs are printed on server side
2. Generate the full log, and cut it down to match the expected output

I opted for the 2nd one, since the q.out result is very slim and if we have any problem then
we will have to dig into the cluster hive.log to find any indication what caused the failure.
This can be a real pain when multiple tests are running in parallel. To avoid this problem,
in the 2nd option we have a very detailed log for every qfile test runs specifically for the
test (.q.out.raw), and we cut it down to the expected detail level (.q.out).

Since these are only for tests, I think we can live with compromise where there are restrictions
on the .q and .q.out files which can be used with the driver. If we choose this solution we
can enhance it by:
- Checking the .q and the .q.out file not to contain the bounding character
- When filtering we can check that the number of bounding characters are even as expected
- Use a little more complicated bounding sequence to lower the likelihood of collision

What do you think?
Even after these enhancements, would you vote for using minimal logging, or we could accept
the restrictions above?

Thanks for the review,
Appreciate that you take time to look through these changes!

Peter


> On March 17, 2017, 12:37 a.m., Zoltan Haindrich wrote:
> > itests/util/src/main/java/org/apache/hive/beeline/qfile/QFile.java
> > Lines 79 (patched)
> > <https://reviews.apache.org/r/57693/diff/1/?file=1666199#file1666199line88>
> >
> >     i went around a few times...and I still don't see the point of inserting all
these BOUNDING_CHARs...this line effectively discards all the extra information BOUNDING_CHAR
may have added earlier (but it kills every second bucket; which content might be something...)
> >     
> >     I'm not sure...but I think that you might be able to install some streamdistributors
or some other usefull gadget in SessionState.LogHelper - and possibly achieve something similar;
without having to rely on a special separator...

See above. I think I understand your concern, and it is the same issue as the comment above.


- Peter


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/57693/#review169228
-----------------------------------------------------------


On March 16, 2017, 5:15 p.m., Peter Vary wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/57693/
> -----------------------------------------------------------
> 
> (Updated March 16, 2017, 5:15 p.m.)
> 
> 
> Review request for hive, Zoltan Haindrich, Marta Kuczora, Miklos Csanady, Naveen Gangam,
Vihang Karajgaonkar, and Barna Zsombor Klara.
> 
> 
> Bugs: HIVE-16146
>     https://issues.apache.org/jira/browse/HIVE-16146
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> The goal was to generate '\0' markers around the raw log items we want to keep in the
golden files.
> 
> To archive this I had to do a small functional change and some small refactor:
> - Removed the immutability of the format map in BeeLine, so the test could add the QFile
specific OutputFromat as a possible format
> - The PostExecutePrinter and the PreExecutePrinter got a common ancestor, which was due
anyway because PostExecutePrinter reused static methods from PreExecutePrinter. This way I
was able to create QFile specific printers which are generating the desired markers.
> - Moved the QFile test to the org.apache.hive.beeline package, so the test classes can
use the package private classes and methods
> 
> For one reason or other BeeLine added an extra space character at the end of the lines
for multiline commands - I have removed this space - Will see if this effects any unit test
or not.
> 
> With the above mentioned *OutputFromat*, *QFilePreExecutePrinter*, *QFilePostExecutePrinter*
we can mark the lines which are needed in the q.out file, and during the filtering we can
remove the unneeded  parts - I prefer to keep the log level high in the raw files, so in case
of a test failure we can have better understanding of what has happened.
> 
> In the test files:
> - Updated the beforeExecute, and afterExecute methods to set the new outputformat and
the new hooks
> - Removed the query specific filters, since they are non exitstent in the CLI tests
> - Simplified the static filterset - currently only contains the filters which are really
neccessary for the actual tests - might grow to the same size than in the QTestUtils - but
if we do not want to run all of the test we would like to keep this list as small as possible
> - Removed unnecessary configurations from QFileBuilder which will be not needed in case
we want to mimic the CLI results
> 
> 
> Diffs
> -----
> 
>   beeline/src/java/org/apache/hive/beeline/BeeLine.java 3c8fccc 
>   beeline/src/java/org/apache/hive/beeline/Commands.java 99ee82c 
>   itests/src/test/resources/testconfiguration.properties 0c590c8 
>   itests/util/src/main/java/org/apache/hadoop/hive/cli/control/CoreBeeLineDriver.java
acc02eb 
>   itests/util/src/main/java/org/apache/hive/beeline/QFileOutputFormat.java PRE-CREATION

>   itests/util/src/main/java/org/apache/hive/beeline/QFilePostExecutePrinter.java PRE-CREATION

>   itests/util/src/main/java/org/apache/hive/beeline/QFilePreExecutePrinter.java PRE-CREATION

>   itests/util/src/main/java/org/apache/hive/beeline/qfile/QFile.java 49d6d24 
>   itests/util/src/main/java/org/apache/hive/beeline/qfile/QFileBeeLineClient.java b6eac89

>   itests/util/src/main/java/org/apache/hive/beeline/qfile/package-info.java fcd50ec 
>   ql/src/java/org/apache/hadoop/hive/ql/hooks/PostExecutePrinter.java b4fc125 
>   ql/src/java/org/apache/hadoop/hive/ql/hooks/PreExecutePrinter.java 232c62d 
>   ql/src/java/org/apache/hadoop/hive/ql/hooks/PrinterHook.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 0732207 
>   ql/src/test/results/clientpositive/beeline/drop_with_concurrency.q.out d22c9ec 
>   ql/src/test/results/clientpositive/beeline/escape_comments.q.out 5f9df93 
>   ql/src/test/results/clientpositive/beeline/select_dummy_source.q.out PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/57693/diff/1/
> 
> 
> Testing
> -------
> 
> Added a new simple query file from CLI driver, and checked that the generated output
is the same
> 
> 
> Thanks,
> 
> Peter Vary
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message