How We Improved Scheduler Performance for Large-scale Jobs - Part One
Source :
flink.apache.org
Author :
Zhilong Hong , Zhu Zhu , Daisy Tsang , & Till Rohr
When scheduling large-scale jobs in Flink 1.12, a lot of time is required to initialize jobs and deploy tasks. The scheduler also requires a large amount of heap memory in order to store the execution topology and host temporary deployment descriptors. For example, for a job with a topology that contains two vertices connected with an all-to-all edge and a parallelism of 10k (which means there are 10k source tasks and 10k sink tasks and every source task is connected to all sink tasks)