Mailing-List: contact issues-help@hive.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@hive.apache.org
Date: Fri, 28 Jul 2017 16:40:00 +0000 (UTC)
From: =?utf-8?Q?Sergio_Pe=C3=B1a_=28JIRA=29?= <jira@apache.org>
To: issues@hive.apache.org
Message-ID: <JIRA.13079310.1497301212000.38734.1501260000178@Atlassian.JIRA>
In-Reply-To: <JIRA.13079310.1497301212000@Atlassian.JIRA>
References: <JIRA.13079310.1497301212000@Atlassian.JIRA> <JIRA.13079310.1497301212489@jira-lw-us.apache.org>
Subject: [jira] [Commented] (HIVE-16886) HMS log notifications may have
 duplicated event IDs if multiple HMS are running concurrently
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
archived-at: Fri, 28 Jul 2017 16:40:05 -0000


    [ https://issues.apache.org/jira/browse/HIVE-16886?page=3Dcom.atlassian=
.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D1610=
5271#comment-16105271 ]=20

Sergio Pe=C3=B1a commented on HIVE-16886:
------------------------------------

[~anishek] [~thejas] While running some tests with duplicated events IDs in=
 HMS HA mode, I see that the NL_ID is never duplicated and is always consec=
utive and in order. Do you know why we're not using this ID instead? Seems =
more consistent and better to use.

[~akolb] FYI

{noformat}
[hive1]> select NL_ID, EVENT_ID, EVENT_TIME, EVENT_TYPE, DB_NAME from NOTIF=
ICATION_LOG where NL_ID >=3D 5431 and NL_ID <=3D 5440;
+-------+----------+------------+-----------------+------------------------=
----------------+
| NL_ID | EVENT_ID | EVENT_TIME | EVENT_TYPE      | DB_NAME                =
                |
+-------+----------+------------+-----------------+------------------------=
----------------+
|  5431 |     5094 | 1501109698 | CREATE_DATABASE | metastore_test_db_HIVE_=
HIVEMETASTORE_2 |
|  5432 |     5097 | 1501109698 | CREATE_TABLE    | metastore_test_db_HIVE_=
HIVEMETASTORE_2 |
|  5433 |     5098 | 1501109699 | ADD_PARTITION   | metastore_test_db_HIVE_=
HIVEMETASTORE_2 |
|  5434 |     5101 | 1501109791 | DROP_TABLE      | metastore_test_db_HIVE_=
HIVEMETASTORE_2 |
|  5435 |     5104 | 1501109792 | DROP_DATABASE   | metastore_test_db_HIVE_=
HIVEMETASTORE_2 |
|  5436 |     5096 | 1501109698 | CREATE_DATABASE | metastore_test_db_HIVE_=
HIVEMETASTORE_1 |
|  5437 |     5097 | 1501109698 | CREATE_TABLE    | metastore_test_db_HIVE_=
HIVEMETASTORE_1 |
|  5438 |     5100 | 1501109699 | ADD_PARTITION   | metastore_test_db_HIVE_=
HIVEMETASTORE_1 |
|  5439 |     5102 | 1501109791 | DROP_TABLE      | metastore_test_db_HIVE_=
HIVEMETASTORE_1 |
|  5440 |     5105 | 1501109792 | DROP_DATABASE   | metastore_test_db_HIVE_=
HIVEMETASTORE_1 |
+-------+----------+------------+-----------------+------------------------=
----------------+
{noformat}

> HMS log notifications may have duplicated event IDs if multiple HMS are r=
unning concurrently
> -------------------------------------------------------------------------=
-------------------
>
>                 Key: HIVE-16886
>                 URL: https://issues.apache.org/jira/browse/HIVE-16886
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive, Metastore
>            Reporter: Sergio Pe=C3=B1a
>
> When running multiple Hive Metastore servers and DB notifications are ena=
bled, I could see that notifications can be persisted with a duplicated eve=
nt ID.=20
> This does not happen when running multiple threads in a single HMS node d=
ue to the locking acquired on the DbNotificationsLog class, but multiple HM=
S could cause conflicts.
> The issue is in the ObjectStore#addNotificationEvent() method. The event =
ID fetched from the datastore is used for the new notification, incremented=
 in the server itself, then persisted or updated back to the datastore. If =
2 servers read the same ID, then these 2 servers write a new notification w=
ith the same ID.
> The event ID is not unique nor a primary key.
> Here's a test case using the TestObjectStore class that confirms this iss=
ue:
> {noformat}
> @Test
>   public void testConcurrentAddNotifications() throws ExecutionException,=
 InterruptedException {
>     final int NUM_THREADS =3D 2;
>     CountDownLatch countIn =3D new CountDownLatch(NUM_THREADS);
>     CountDownLatch countOut =3D new CountDownLatch(1);
>     HiveConf conf =3D new HiveConf();
>     conf.setVar(HiveConf.ConfVars.METASTORE_EXPRESSION_PROXY_CLASS, MockP=
artitionExpressionProxy.class.getName());
>     ExecutorService executorService =3D Executors.newFixedThreadPool(NUM_=
THREADS);
>     FutureTask<Void> tasks[] =3D new FutureTask[NUM_THREADS];
>     for (int i=3D0; i<NUM_THREADS; i++) {
>       final int n =3D i;
>       tasks[i] =3D new FutureTask<Void>(new Callable<Void>() {
>         @Override
>         public Void call() throws Exception {
>           ObjectStore store =3D new ObjectStore();
>           store.setConf(conf);
>           NotificationEvent dbEvent =3D
>               new NotificationEvent(0, 0, EventMessage.EventType.CREATE_D=
ATABASE.toString(), "CREATE DATABASE DB" + n);
>           System.out.println("ADDING NOTIFICATION");
>           countIn.countDown();
>           countOut.await();
>           store.addNotificationEvent(dbEvent);
>           System.out.println("FINISH NOTIFICATION");
>           return null;
>         }
>       });
>       executorService.execute(tasks[i]);
>     }
>     countIn.await();
>     countOut.countDown();
>     for (int i =3D 0; i < NUM_THREADS; ++i) {
>       tasks[i].get();
>     }
>     NotificationEventResponse eventResponse =3D objectStore.getNextNotifi=
cation(new NotificationEventRequest());
>     Assert.assertEquals(2, eventResponse.getEventsSize());
>     Assert.assertEquals(1, eventResponse.getEvents().get(0).getEventId())=
;
>     // This fails because the next notification has an event ID =3D 1
>     Assert.assertEquals(2, eventResponse.getEvents().get(1).getEventId())=
;
>   }
> {noformat}
> The last assertion fails expecting an event ID 1 instead of 2.=20


--
This message was sent by Atlassian JIRA
(v6.4.14#64029)