Skip to content

[Bug] Getting shuffle handle info failed due to the driver pressure #2665

@zuston

Description

@zuston

Code of Conduct

Search before asking

  • I have searched in the issues and found no similar issues.

Describe the bug

org.apache.uniffle.shaded.io.grpc.StatusRuntimeException: DEADLINE_EXCEEDED: CallOptions deadline exceeded after 59.999985740s. Name resolution delay 0.000000000 seconds. [closed=[], open=[[remote_addr=nodexxxxxi.hadoop/10.xx.xx.xx:45921]]]
	at org.apache.uniffle.shaded.io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:268)
	at org.apache.uniffle.shaded.io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:249)
	at org.apache.uniffle.shaded.io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:167)
	at org.apache.uniffle.proto.ShuffleManagerGrpc$ShuffleManagerBlockingStub.getPartitionToShufflerServerWithBlockRetry(ShuffleManagerGrpc.java:644)
	at org.apache.uniffle.client.impl.grpc.ShuffleManagerGrpcClient.getPartitionToShufflerServerWithBlockRetry(ShuffleManagerGrpcClient.java:109)
	at org.apache.uniffle.shuffle.manager.RssShuffleManagerBase.getRemoteShuffleHandleInfoWithBlockRetry(RssShuffleManagerBase.java:955)
	at org.apache.uniffle.shuffle.manager.RssShuffleManagerBase.getShuffleHandleInfo(RssShuffleManagerBase.java:912)
	at org.apache.spark.shuffle.writer.RssShuffleWriter.(RssShuffleWriter.java:307)
	at org.apache.spark.shuffle.RssShuffleManager.getWriter(RssShuffleManager.java:260)
	at org.apache.spark.shuffle.QiyiRssShuffleManager.getWriter(QiyiRssShuffleManager.java:249)
	at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:57)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:104)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:54)
	at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
	at org.apache.spark.scheduler.Task.run(Task.scala:141)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620)
	at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
	at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

Affects Version(s)

master

Uniffle Server Log Output

Uniffle Engine Log Output

Uniffle Server Configurations

Uniffle Engine Configurations

Additional context

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions