atlas-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hemanth Yamijala (JIRA)" <>
Subject [jira] [Updated] (ATLAS-690) Read timed out exceptions when tables are imported into Atlas.
Date Mon, 02 May 2016 12:39:12 GMT


Hemanth Yamijala updated ATLAS-690:
    Attachment: ATLAS-690-3.patch

This patch fixes two things, mainly:

* For user defined type attributes which are unique, this adds a composite index that combines
the attribute with state. This is to fix the regression where because we now look up unique
attributes along with state, without the index, they become slow.
* When adding / updating an entity, we create a property that aggregates the values of all
attributes and the values of attributes of the first level references of the entity so that
a full text index property can be added. As mentioned in the last comment, with a recent fix,
for a hive column we added a back reference to the hive table. This caused every hive column
to load every hive table vertex along with all its columns again, creating a quadratic effect
on times that we observed. To fix this, I have attempted to cache the GUID of an entity against
the reference created by the first load. This cache is only applicable for that request and
does not persist across (to avoid any memory issues).

With these two fixes, the times taken for a 1000 table load in my local setup is taking almost
the same time as before the regressions.

However, this patch is still not for submission. In particular, there are a couple of tests
failing. Also, I request [~ssainath] to run with this patch on her environment to make sure
we are seeing the improvements at scale. I also need to enhance tests for code I've written.
Will update with a new patch once these are done. Still, would appreciate if someone can look
at the fix and provide early feedback.

> Read timed out exceptions when tables are imported into Atlas.
> --------------------------------------------------------------
>                 Key: ATLAS-690
>                 URL:
>             Project: Atlas
>          Issue Type: Bug
>         Environment: Atlas with External Kafka/  HBase / Solr
> atlas.notification.hook.numthreads=5
> ATLAS_HOOK created with 5 partitions
>            Reporter: Sharmadha Sainath
>            Assignee: Hemanth Yamijala
>            Priority: Blocker
>         Attachments: ATLAS-690-3.patch
> When 1000 tables are imported into Atlas using Hive hook,Read time out exceptions occur.
This happened with the latest Atlas build with commit id : 922a83c9a10e857d54855463225e9a5c375bc2b9.

>    • Hive ingestion was completed in 1 minute 50 secs. 
>    • Atlas ingestion took more than an hour .
> With Last 1000 tables run that was done in Atlas with commit id :
> b9575f29df3cc014f1b076abf52d88249bf4d0ef,
>  • Hive ingestion was completed in 3 minutes
>  • Atlas ingestion by 5 minutes.
> The Exception stack trace :
> Error handling message org.apache.atlas.notification.hook.HookNotification$EntityUpdateRequest@7474dd2d
> com.sun.jersey.api.client.ClientHandlerException: Read
timed out
> at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(
> at com.sun.jersey.api.client.Client.handle(
> at com.sun.jersey.api.client.WebResource.handle(
> at com.sun.jersey.api.client.WebResource.access$200(
> at com.sun.jersey.api.client.WebResource$Builder.method(
> at org.apache.atlas.AtlasClient.callAPIWithResource(
> at org.apache.atlas.AtlasClient.callAPIWithRetries(
> at org.apache.atlas.AtlasClient.callAPI(
> at org.apache.atlas.AtlasClient.updateEntities(
> at org.apache.atlas.notification.NotificationHookConsumer$
> at java.util.concurrent.Executors$
> at
> at java.util.concurrent.ThreadPoolExecutor.runWorker(
> at java.util.concurrent.ThreadPoolExecutor$
> at
> Caused by: Read timed out
> at Method)
> at
> at
> at
> at
> at
> at
> at
> at
> at
> at com.sun.jersey.client.urlconnection.URLConnectionClientHandler._invoke(
> at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(

This message was sent by Atlassian JIRA

View raw message