Return-Path: X-Original-To: apmail-kafka-dev-archive@www.apache.org Delivered-To: apmail-kafka-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 67BEF1762B for ; Tue, 11 Nov 2014 03:17:44 +0000 (UTC) Received: (qmail 25046 invoked by uid 500); 11 Nov 2014 03:17:44 -0000 Delivered-To: apmail-kafka-dev-archive@kafka.apache.org Received: (qmail 24993 invoked by uid 500); 11 Nov 2014 03:17:44 -0000 Mailing-List: contact dev-help@kafka.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@kafka.apache.org Delivered-To: mailing list dev@kafka.apache.org Received: (qmail 24977 invoked by uid 99); 11 Nov 2014 03:17:43 -0000 Received: from reviews-vm.apache.org (HELO reviews.apache.org) (140.211.11.40) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 11 Nov 2014 03:17:43 +0000 Received: from reviews.apache.org (localhost [127.0.0.1]) by reviews.apache.org (Postfix) with ESMTP id 0BA851DFE09; Tue, 11 Nov 2014 03:17:47 +0000 (UTC) Content-Type: multipart/alternative; boundary="===============6706444170132856283==" MIME-Version: 1.0 Subject: Re: Review Request 27534: Patch for KAFKA-1746 From: "Mayuresh Gharat" To: "Mayuresh Gharat" , "kafka" , "Ewen Cheslack-Postava" Date: Tue, 11 Nov 2014 03:17:47 -0000 Message-ID: <20141111031747.19819.69933@reviews.apache.org> X-ReviewBoard-URL: https://reviews.apache.org Auto-Submitted: auto-generated Sender: "Mayuresh Gharat" X-ReviewGroup: kafka X-ReviewRequest-URL: https://reviews.apache.org/r/27534/ X-Sender: "Mayuresh Gharat" References: <20141108030111.12275.31700@reviews.apache.org> In-Reply-To: <20141108030111.12275.31700@reviews.apache.org> Reply-To: "Mayuresh Gharat" X-ReviewRequest-Repository: kafka --===============6706444170132856283== MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit > On Nov. 8, 2014, 3:01 a.m., Mayuresh Gharat wrote: > > system_test/utils/kafka_system_test_utils.py, line 2403 > > > > > > This causes many system tests (approx 98) to fail > > Ewen Cheslack-Postava wrote: > Can you give list of a few that now fail for you? Depending on the tests, they *should* start failing -- unless the failure is caused by something other than DumpLogSegments returning a negative return code, the new failures indicate that the tests weren't properly checking DumpLogSegments output. I originally noticed this because I saw DumpLogSegments not catching an exception and exiting improperly, but the subsequent checks of stdout didn't properly check for that (and the exit code is a *much* more reliable test). The system tests seem to never check subprocess exit statuses, so there are likely many other similar bugs. > > If it's something else (e.g. some issue with the subprocess wait() call), I can try to track it down if I know which test to look at. > > Mayuresh Gharat wrote: > These are some of the test that are failing. I am pasting the console output for them : > > _test_case_name : testcase_0008 > _test_class_name : ReplicaBasicTest > arg : bounce_broker : false > arg : broker_type : leader > arg : message_producing_free_time_sec : 15 > arg : num_iteration : 1 > arg : num_messages_to_produce_per_producer_call : 50 > arg : num_partition : 3 > arg : replica_factor : 3 > arg : sleep_seconds_between_producer_calls : 1 > validation_status : > No. of messages from consumer on [test_1] at simple_consumer_test_1-0_r1.log : 800 > No. of messages from consumer on [test_1] at simple_consumer_test_1-0_r2.log : 800 > No. of messages from consumer on [test_1] at simple_consumer_test_1-0_r3.log : 800 > No. of messages from consumer on [test_1] at simple_consumer_test_1-1_r1.log : 960 > No. of messages from consumer on [test_1] at simple_consumer_test_1-1_r2.log : 960 > No. of messages from consumer on [test_1] at simple_consumer_test_1-1_r3.log : 960 > No. of messages from consumer on [test_1] at simple_consumer_test_1-2_r1.log : 1440 > No. of messages from consumer on [test_1] at simple_consumer_test_1-2_r2.log : 1440 > No. of messages from consumer on [test_1] at simple_consumer_test_1-2_r3.log : 1440 > Unique messages from consumer on [test_1] : 3200 > Unique messages from producer on [test_1] : 3200 > Validate for data matched on topic [test_1] : PASSED > Validate for data matched on topic [test_1] across replicas : PASSED > Validate for merged log segment checksum in cluster [source] : PASSED > Validate index log in cluster [source] : FAILED > > ======================================================== > > _test_case_name : testcase_0009 > _test_class_name : ReplicaBasicTest > arg : bounce_broker : false > arg : broker_type : leader > arg : message_producing_free_time_sec : 15 > arg : num_iteration : 1 > arg : num_messages_to_produce_per_producer_call : 50 > arg : num_partition : 3 > arg : replica_factor : 3 > arg : sleep_seconds_between_producer_calls : 1 > validation_status : > No. of messages from consumer on [test_1] at simple_consumer_test_1-0_r1.log : 775 > No. of messages from consumer on [test_1] at simple_consumer_test_1-0_r2.log : 775 > No. of messages from consumer on [test_1] at simple_consumer_test_1-0_r3.log : 775 > No. of messages from consumer on [test_1] at simple_consumer_test_1-1_r1.log : 930 > No. of messages from consumer on [test_1] at simple_consumer_test_1-1_r2.log : 930 > No. of messages from consumer on [test_1] at simple_consumer_test_1-1_r3.log : 930 > No. of messages from consumer on [test_1] at simple_consumer_test_1-2_r1.log : 1395 > No. of messages from consumer on [test_1] at simple_consumer_test_1-2_r2.log : 1395 > No. of messages from consumer on [test_1] at simple_consumer_test_1-2_r3.log : 1395 > Unique messages from consumer on [test_1] : 3100 > Unique messages from producer on [test_1] : 3100 > Validate for data matched on topic [test_1] : PASSED > Validate for data matched on topic [test_1] across replicas : PASSED > Validate for merged log segment checksum in cluster [source] : PASSED > Validate index log in cluster [source] : FAILED > > ======================================================== > > _test_case_name : testcase_0010 > _test_class_name : ReplicaBasicTest > arg : bounce_broker : false > arg : broker_type : leader > arg : message_producing_free_time_sec : 15 > arg : num_iteration : 1 > arg : num_messages_to_produce_per_producer_call : 50 > arg : num_partition : 3 > arg : replica_factor : 3 > arg : sleep_seconds_between_producer_calls : 1 > validation_status : > No. of messages from consumer on [test_1] at simple_consumer_test_1-0_r1.log : 800 > No. of messages from consumer on [test_1] at simple_consumer_test_1-0_r2.log : 800 > No. of messages from consumer on [test_1] at simple_consumer_test_1-0_r3.log : 800 > No. of messages from consumer on [test_1] at simple_consumer_test_1-1_r1.log : 960 > No. of messages from consumer on [test_1] at simple_consumer_test_1-1_r2.log : 960 > No. of messages from consumer on [test_1] at simple_consumer_test_1-1_r3.log : 960 > No. of messages from consumer on [test_1] at simple_consumer_test_1-2_r1.log : 1440 > No. of messages from consumer on [test_1] at simple_consumer_test_1-2_r2.log : 1440 > No. of messages from consumer on [test_1] at simple_consumer_test_1-2_r3.log : 1440 > Unique messages from consumer on [test_1] : 3200 > Unique messages from producer on [test_1] : 3200 > Validate for data matched on topic [test_1] : PASSED > Validate for data matched on topic [test_1] across replicas : PASSED > Validate for merged log segment checksum in cluster [source] : PASSED > Validate index log in cluster [source] : FAILED > > ======================================================== > > _test_case_name : testcase_0011 > _test_class_name : ReplicaBasicTest > arg : auto_create_topic : true > arg : bounce_broker : false > arg : broker_type : leader > arg : message_producing_free_time_sec : 15 > arg : num_iteration : 1 > arg : num_messages_to_produce_per_producer_call : 50 > arg : num_partition : 3 > arg : replica_factor : 3 > arg : sleep_seconds_between_producer_calls : 1 > validation_status : > No. of messages from consumer on [test_1] at simple_consumer_test_1-0_r1.log : 775 > No. of messages from consumer on [test_1] at simple_consumer_test_1-0_r2.log : 775 > No. of messages from consumer on [test_1] at simple_consumer_test_1-0_r3.log : 775 > No. of messages from consumer on [test_1] at simple_consumer_test_1-1_r1.log : 930 > No. of messages from consumer on [test_1] at simple_consumer_test_1-1_r2.log : 930 > No. of messages from consumer on [test_1] at simple_consumer_test_1-1_r3.log : 930 > No. of messages from consumer on [test_1] at simple_consumer_test_1-2_r1.log : 1395 > No. of messages from consumer on [test_1] at simple_consumer_test_1-2_r2.log : 1395 > No. of messages from consumer on [test_1] at simple_consumer_test_1-2_r3.log : 1395 > Unique messages from consumer on [test_1] : 3100 > Unique messages from producer on [test_1] : 3100 > Validate for data matched on topic [test_1] : PASSED > Validate for data matched on topic [test_1] across replicas : PASSED > Validate for merged log segment checksum in cluster [source] : PASSED > Validate index log in cluster [source] : FAILED > > ======================================================== > > _test_case_name : testcase_0021 > _test_class_name : ReplicaBasicTest > arg : bounce_broker : false > arg : broker_type : leader > arg : message_producing_free_time_sec : 15 > arg : num_iteration : 1 > arg : num_partition : 3 > arg : replica_factor : 3 > arg : sleep_seconds_between_producer_calls : 1 > validation_status : > No. of messages from consumer on [test_1] at simple_consumer_test_1-0_r1.log : 675 > No. of messages from consumer on [test_1] at simple_consumer_test_1-0_r2.log : 675 > No. of messages from consumer on [test_1] at simple_consumer_test_1-0_r3.log : 675 > No. of messages from consumer on [test_1] at simple_consumer_test_1-1_r1.log : 810 > No. of messages from consumer on [test_1] at simple_consumer_test_1-1_r2.log : 810 > No. of messages from consumer on [test_1] at simple_consumer_test_1-1_r3.log : 810 > No. of messages from consumer on [test_1] at simple_consumer_test_1-2_r1.log : 1215 > No. of messages from consumer on [test_1] at simple_consumer_test_1-2_r2.log : 1215 > No. of messages from consumer on [test_1] at simple_consumer_test_1-2_r3.log : 1215 > No. of messages from consumer on [test_2] at simple_consumer_test_2-0_r1.log : 675 > No. of messages from consumer on [test_2] at simple_consumer_test_2-0_r2.log : 675 > No. of messages from consumer on [test_2] at simple_consumer_test_2-0_r3.log : 675 > No. of messages from consumer on [test_2] at simple_consumer_test_2-1_r1.log : 810 > No. of messages from consumer on [test_2] at simple_consumer_test_2-1_r2.log : 810 > No. of messages from consumer on [test_2] at simple_consumer_test_2-1_r3.log : 810 > No. of messages from consumer on [test_2] at simple_consumer_test_2-2_r1.log : 1215 > No. of messages from consumer on [test_2] at simple_consumer_test_2-2_r2.log : 1215 > No. of messages from consumer on [test_2] at simple_consumer_test_2-2_r3.log : 1215 > Unique messages from consumer on [test_1] : 2700 > Unique messages from consumer on [test_2] : 2700 > Unique messages from producer on [test_1] : 2700 > Unique messages from producer on [test_2] : 2700 > Validate for data matched on topic [test_1] : PASSED > Validate for data matched on topic [test_1] across replicas : PASSED > Validate for data matched on topic [test_2] : PASSED > Validate for data matched on topic [test_2] across replicas : PASSED > Validate for merged log segment checksum in cluster [source] : PASSED > Validate index log in cluster [source] : FAILED > > Ewen Cheslack-Postava wrote: > This looks like it's just a byproduct of this patch correcting the handling of errors when running DumpLogSegments. You can see from this debug log that DumpLogSegments is throwing an exception because it's being run on the remote host but uses a path for a log file that's been synced locally (the /Users/ewencp path is on my Mac, but you can see it's SSHing' to a Vagrant VM to run the commnand): > > 2014-11-10 12:09:57,665 - DEBUG - executing command [ssh vagrant@worker1 -o 'HostName 127.0.0.1' -o 'Port 2222' -o 'UserKnownHostsFile /dev/null' -o 'StrictHostKeyChecking no' -o 'PasswordAuthentication no' -o 'IdentityFile /Users/ewencp/.vagrant.d/insecure_private_key' -o 'IdentitiesOnly yes' -o 'LogLevel FATAL' '/opt/kafka/bin/kafka-run-class.sh kafka.tools.DumpLogSegments --file /Users/ewencp/kafka.git/system_test/replication_testsuite/testcase_0008/logs/broker-3/kafka_server_3_logs/test_1-2/00000000000000001294.index --verify-index-only 2>&1'] (system_test_utils) > 2014-11-10 12:09:58,673 - DEBUG - Dumping /Users/ewencp/kafka.git/system_test/replication_testsuite/testcase_0008/logs/broker-3/kafka_server_3_logs/test_1-2/00000000000000001294.index (kafka_system_test_utils) > 2014-11-10 12:09:58,673 - DEBUG - Exception in thread "main" java.io.FileNotFoundException: /Users/ewencp/kafka.git/system_test/replication_testsuite/testcase_0008/logs/broker-3/kafka_server_3_logs/test_1-2/00000000000000001294.log (No such file or directory) (kafka_system_test_utils) > 2014-11-10 12:09:58,673 - DEBUG - at java.io.FileInputStream.open(Native Method) (kafka_system_test_utils) > 2014-11-10 12:09:58,673 - DEBUG - at java.io.FileInputStream.(FileInputStream.java:146) (kafka_system_test_utils) > 2014-11-10 12:09:58,673 - DEBUG - at kafka.utils.Utils$.openChannel(Utils.scala:162) (kafka_system_test_utils) > 2014-11-10 12:09:58,673 - DEBUG - at kafka.log.FileMessageSet.(FileMessageSet.scala:74) (kafka_system_test_utils) > 2014-11-10 12:09:58,673 - DEBUG - at kafka.tools.DumpLogSegments$.kafka$tools$DumpLogSegments$$dumpIndex(DumpLogSegments.scala:108) (kafka_system_test_utils) > 2014-11-10 12:09:58,673 - DEBUG - at kafka.tools.DumpLogSegments$$anonfun$main$1.apply(DumpLogSegments.scala:80) (kafka_system_test_utils) > 2014-11-10 12:09:58,674 - DEBUG - at kafka.tools.DumpLogSegments$$anonfun$main$1.apply(DumpLogSegments.scala:73) (kafka_system_test_utils) > 2014-11-10 12:09:58,674 - DEBUG - at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) (kafka_system_test_utils) > 2014-11-10 12:09:58,674 - DEBUG - at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:105) (kafka_system_test_utils) > 2014-11-10 12:09:58,674 - DEBUG - at kafka.tools.DumpLogSegments$.main(DumpLogSegments.scala:73) (kafka_system_test_utils) > 2014-11-10 12:09:58,674 - DEBUG - at kafka.tools.DumpLogSegments.main(DumpLogSegments.scala) (kafka_system_test_utils) > > I've opened a separate issue KAFKA-1763 for the fix and I'll submit a patch soon. Cool. Thanks. - Mayuresh ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/27534/#review60460 ----------------------------------------------------------- On Nov. 3, 2014, 7:46 p.m., Ewen Cheslack-Postava wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/27534/ > ----------------------------------------------------------- > > (Updated Nov. 3, 2014, 7:46 p.m.) > > > Review request for kafka. > > > Bugs: KAFKA-1746 > https://issues.apache.org/jira/browse/KAFKA-1746 > > > Repository: kafka > > > Description > ------- > > KAFKA-1746 Make system tests return a useful exit code. > > > KAFKA-1746 Check the exit code when running DumpLogSegments to verify data. > > > Diffs > ----- > > system_test/mirror_maker_testsuite/mirror_maker_test.py c0117c64cbb7687ca8fbcec6b5c188eb880300ef > system_test/offset_management_testsuite/offset_management_test.py 12b5cd25140e1eb407dd57eef63d9783257688b2 > system_test/replication_testsuite/replica_basic_test.py 660006cc253bbae3e7cd9f02601f1c1937dd1714 > system_test/system_test_runner.py ee7aa252333553e8eb0bc046edf968ec99dddb70 > system_test/utils/kafka_system_test_utils.py 1093b660ebd0cb5ab6d3731d26f151e1bf717f8a > > Diff: https://reviews.apache.org/r/27534/diff/ > > > Testing > ------- > > > Thanks, > > Ewen Cheslack-Postava > > --===============6706444170132856283==--