reef-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Najeeb Kazmi (JIRA)" <j...@apache.org>
Subject [jira] [Created] (REEF-2017) Org.Apache.REEF.IO.FileSystem.AzureBlob produces Error 503 (server unavailable) when reading data from Azure Blob into >=80 evaluators
Date Sat, 12 May 2018 00:33:00 GMT
Najeeb Kazmi created REEF-2017:
----------------------------------

             Summary: Org.Apache.REEF.IO.FileSystem.AzureBlob produces Error 503 (server unavailable)
when reading data from Azure Blob into >=80 evaluators
                 Key: REEF-2017
                 URL: https://issues.apache.org/jira/browse/REEF-2017
             Project: REEF
          Issue Type: Bug
          Components: REEF.NET IO
    Affects Versions: 0.17
            Reporter: Najeeb Kazmi
             Fix For: 0.17


Running into an issue where Azure Storage produces Microsoft.WindowsAzure.Storage.StorageException
Error 503 server unavailable when I run a job that downloads data partitions from Azure Storage
to 80 evaluators or more. This does not happen when using 64 evaluators. Full stack trace
below.

 

 

Org.Apache.REEF.IMRU.OnREEF.Driver.IMRUDriver`4[[Microsoft.MachineLearning.Distributed.Core.Trainers.KMeans.InputOutput.KMeansInputOutput,
Microsoft.MachineLearning.Distributed.Core, Version=0.3.0.0, Culture=neutral, PublicKeyToken=null],[Microsoft.MachineLearning.Distributed.Core.Trainers.KMeans.InputOutput.KMeansInputOutput,
Microsoft.MachineLearning.Distributed.Core, Version=0.3.0.0, Culture=neutral, PublicKeyToken=null],[Microsoft.MachineLearning.Runtime.IPredictor,
Microsoft.MachineLearning.Core, Version=3.9.290.3615, Culture=neutral, PublicKeyToken=d353f9ba84f0e281],[Microsoft.MachineLearning.Distributed.Core.Common.IPipeline,
Microsoft.MachineLearning.Distributed.Core, Version=0.3.0.0, Culture=neutral, PublicKeyToken=null]]
Warning: 0 : 2018-05-11T00:59:28.4674513+00:00 0031 : WARNING: Received IFailedEvaluator bf0bcb92-5773-448d-bffa-6c478b619beb
from endpoint unknown_endpoint with systemState WaitingForEvaluator in retry# 0 with Exception:
Org.Apache.REEF.Driver.Evaluator.EvaluatorException: One or more errors occurred. --->
System.AggregateException: One or more errors occurred. ---> Microsoft.WindowsAzure.Storage.StorageException:
The remote server returned an error: (503) Server Unavailable. ---> System.Net.WebException:
The remote server returned an error: (503) Server Unavailable.
 at Microsoft.WindowsAzure.Storage.Shared.Protocol.HttpResponseParsers.ProcessExpectedStatusCodeNoException[T](HttpStatusCode
expectedStatusCode, HttpStatusCode actualStatusCode, T retVal, StorageCommandBase`1 cmd, Exception
ex)
 at Microsoft.WindowsAzure.Storage.Blob.CloudBlob.<>c__DisplayClass1e.<GetBlobImpl>b__1b(RESTCommand`1
cmd, HttpWebResponse resp, Exception ex, OperationContext ctx)
 at Microsoft.WindowsAzure.Storage.Core.Executor.Executor.EndGetResponse[T](IAsyncResult getResponseResult)
 --- End of inner exception stack trace ---
 at Microsoft.WindowsAzure.Storage.Core.Util.StorageAsyncResult`1.End()
 at Microsoft.WindowsAzure.Storage.Core.Util.AsyncExtensions.<>c__DisplayClass4.<CreateCallbackVoid>b__3(IAsyncResult
ar)
 --- End of inner exception stack trace ---
 at System.Threading.Tasks.Task.ThrowIfExceptional(Boolean includeTaskCanceledExceptions)
 at System.Threading.Tasks.Task.Wait(Int32 millisecondsTimeout, CancellationToken cancellationToken)
 at Org.Apache.REEF.IO.FileSystem.AzureBlob.AzureCloudBlockBlob.DownloadToFile(String path,
FileMode mode)
 at Org.Apache.REEF.IO.PartitionedData.FileSystem.FileSystemInputPartition`1.Download()
 at Org.Apache.REEF.IO.PartitionedData.FileSystem.FileSystemInputPartition`1.Cache()
 at Org.Apache.REEF.IMRU.OnREEF.Driver.DataLoadingContext`1.OnNext(IContextStart value)
 at Org.Apache.REEF.Common.Runtime.Evaluator.Context.ContextLifeCycle.Start()
 at Org.Apache.REEF.Common.Runtime.Evaluator.Context.ContextRuntime..ctor(IInjector serviceInjector,
IConfiguration contextConfiguration, Optional`1 parentContext)
 --- End of inner exception stack trace ---.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message