reef-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dwaipayan Mukhopadhyay (JIRA)" <j...@apache.org>
Subject [jira] [Assigned] (REEF-2017) Org.Apache.REEF.IO.FileSystem.AzureBlob produces Error 503 (server unavailable) when reading data from Azure Blob into >=80 evaluators
Date Wed, 23 May 2018 22:37:00 GMT

     [ https://issues.apache.org/jira/browse/REEF-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Dwaipayan Mukhopadhyay reassigned REEF-2017:
--------------------------------------------

    Assignee: Dwaipayan Mukhopadhyay

> Org.Apache.REEF.IO.FileSystem.AzureBlob produces Error 503 (server unavailable) when
reading data from Azure Blob into >=80 evaluators
> --------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: REEF-2017
>                 URL: https://issues.apache.org/jira/browse/REEF-2017
>             Project: REEF
>          Issue Type: Bug
>          Components: REEF.NET IO
>    Affects Versions: 0.17
>            Reporter: Najeeb Kazmi
>            Assignee: Dwaipayan Mukhopadhyay
>            Priority: Blocker
>             Fix For: 0.17
>
>
> Running into an issue where Azure Storage produces Microsoft.WindowsAzure.Storage.StorageException
Error 503 server unavailable when I run a job that downloads data partitions from Azure Storage
to 80 evaluators or more. This does not happen when using 64 evaluators. Full stack trace
below.
>  
>  
> Org.Apache.REEF.IMRU.OnREEF.Driver.IMRUDriver`4[[Microsoft.MachineLearning.Distributed.Core.Trainers.KMeans.InputOutput.KMeansInputOutput,
Microsoft.MachineLearning.Distributed.Core, Version=0.3.0.0, Culture=neutral, PublicKeyToken=null],[Microsoft.MachineLearning.Distributed.Core.Trainers.KMeans.InputOutput.KMeansInputOutput,
Microsoft.MachineLearning.Distributed.Core, Version=0.3.0.0, Culture=neutral, PublicKeyToken=null],[Microsoft.MachineLearning.Runtime.IPredictor,
Microsoft.MachineLearning.Core, Version=3.9.290.3615, Culture=neutral, PublicKeyToken=d353f9ba84f0e281],[Microsoft.MachineLearning.Distributed.Core.Common.IPipeline,
Microsoft.MachineLearning.Distributed.Core, Version=0.3.0.0, Culture=neutral, PublicKeyToken=null]]
Warning: 0 : 2018-05-11T00:59:28.4674513+00:00 0031 : WARNING: Received IFailedEvaluator bf0bcb92-5773-448d-bffa-6c478b619beb
from endpoint unknown_endpoint with systemState WaitingForEvaluator in retry# 0 with Exception:
Org.Apache.REEF.Driver.Evaluator.EvaluatorException: One or more errors occurred. --->
System.AggregateException: One or more errors occurred. ---> Microsoft.WindowsAzure.Storage.StorageException:
The remote server returned an error: (503) Server Unavailable. ---> System.Net.WebException:
The remote server returned an error: (503) Server Unavailable.
>  at Microsoft.WindowsAzure.Storage.Shared.Protocol.HttpResponseParsers.ProcessExpectedStatusCodeNoException[T](HttpStatusCode
expectedStatusCode, HttpStatusCode actualStatusCode, T retVal, StorageCommandBase`1 cmd, Exception
ex)
>  at Microsoft.WindowsAzure.Storage.Blob.CloudBlob.<>c__DisplayClass1e.<GetBlobImpl>b__1b(RESTCommand`1
cmd, HttpWebResponse resp, Exception ex, OperationContext ctx)
>  at Microsoft.WindowsAzure.Storage.Core.Executor.Executor.EndGetResponse[T](IAsyncResult
getResponseResult)
>  --- End of inner exception stack trace ---
>  at Microsoft.WindowsAzure.Storage.Core.Util.StorageAsyncResult`1.End()
>  at Microsoft.WindowsAzure.Storage.Core.Util.AsyncExtensions.<>c__DisplayClass4.<CreateCallbackVoid>b__3(IAsyncResult
ar)
>  --- End of inner exception stack trace ---
>  at System.Threading.Tasks.Task.ThrowIfExceptional(Boolean includeTaskCanceledExceptions)
>  at System.Threading.Tasks.Task.Wait(Int32 millisecondsTimeout, CancellationToken cancellationToken)
>  at Org.Apache.REEF.IO.FileSystem.AzureBlob.AzureCloudBlockBlob.DownloadToFile(String
path, FileMode mode)
>  at Org.Apache.REEF.IO.PartitionedData.FileSystem.FileSystemInputPartition`1.Download()
>  at Org.Apache.REEF.IO.PartitionedData.FileSystem.FileSystemInputPartition`1.Cache()
>  at Org.Apache.REEF.IMRU.OnREEF.Driver.DataLoadingContext`1.OnNext(IContextStart value)
>  at Org.Apache.REEF.Common.Runtime.Evaluator.Context.ContextLifeCycle.Start()
>  at Org.Apache.REEF.Common.Runtime.Evaluator.Context.ContextRuntime..ctor(IInjector serviceInjector,
IConfiguration contextConfiguration, Optional`1 parentContext)
>  --- End of inner exception stack trace ---.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message