0
点赞
收藏
分享

微信扫一扫

k8s-调度器自定义

本文主要讲解k8s1.18版本,这里主要是测试验证,实际阿里云不推荐自己另外写一个调度器。

1.scheduler

1.1介绍

在 Kubernetes 项目中,默认调度器的主要职责,就是为一个新创建出来的 Pod,寻找一个最合适的节点(Node)。

而这里“最合适”的含义,包括两层:

1.从集群所有的节点中,根据调度算法(Filter函数)挑选出所有可以运行该 Pod 的节点;

2.从第一步的结果中,再根据调度算法(Score函数)挑选一个最符合条件的节点作为最终结果。

所以在具体的调度流程中,默认调度器会首先调用一组叫作 Predicate 的调度算法,来检查每个 Node。然后,再调用一组叫作 Priority 的调度算法,来给上一步得到的结果里的每个 Node 打分。

最终的调度结果,就是得分最高的那个 Node。调度器对一个 Pod 调度成功,实际上就是将它的 spec.nodeName 字段填上调度结果的节点名字。

k8s-调度器自定义_score

k8s-调度器自定义_score_02

1.2源码解析

可以先只重点关注下带色字体,尤其是粉色和黄色哦,这是大框架,重点。黑色字体是部分细节。

2.调度算法

默认的调度算法如下

在执行调度算法的时候,是所有node同时执行调度算法,对于Filter来说,每个 node都会循环所有的filter,得到可以运行pod的node列表。

func getDefaultConfig() *schedulerapi.Plugins {
return &schedulerapi.Plugins{
QueueSort: &schedulerapi.PluginSet{
Enabled: []schedulerapi.Plugin{
{Name: queuesort.Name},
},
},
PreFilter: &schedulerapi.PluginSet{
Enabled: []schedulerapi.Plugin{
{Name: noderesources.FitName},
{Name: nodeports.Name},
{Name: podtopologyspread.Name},
{Name: interpodaffinity.Name},
{Name: volumebinding.Name},
},
},
Filter: &schedulerapi.PluginSet{
Enabled: []schedulerapi.Plugin{
{Name: nodeunschedulable.Name},
{Name: noderesources.FitName},
{Name: nodename.Name},
{Name: nodeports.Name},
{Name: nodeaffinity.Name},
{Name: volumerestrictions.Name},
{Name: tainttoleration.Name},
{Name: nodevolumelimits.EBSName},
{Name: nodevolumelimits.GCEPDName},
{Name: nodevolumelimits.CSIName},
{Name: nodevolumelimits.AzureDiskName},
{Name: volumebinding.Name},
{Name: volumezone.Name},
{Name: podtopologyspread.Name},
{Name: interpodaffinity.Name},
},
},
PostFilter: &schedulerapi.PluginSet{
Enabled: []schedulerapi.Plugin{
{Name: defaultpreemption.Name},
},
},
PreScore: &schedulerapi.PluginSet{
Enabled: []schedulerapi.Plugin{
{Name: interpodaffinity.Name},
{Name: podtopologyspread.Name},
{Name: tainttoleration.Name},
},
},
Score: &schedulerapi.PluginSet{
Enabled: []schedulerapi.Plugin{
{Name: noderesources.BalancedAllocationName, Weight: 1},
{Name: imagelocality.Name, Weight: 1},
{Name: interpodaffinity.Name, Weight: 1},
{Name: noderesources.LeastAllocatedName, Weight: 1},
{Name: nodeaffinity.Name, Weight: 1},
{Name: nodepreferavoidpods.Name, Weight: 10000},
// Weight is doubled because:
// - This is a score coming from user preference.
// - It makes its signal comparable to NodeResourcesLeastAllocated.
{Name: podtopologyspread.Name, Weight: 2},
{Name: tainttoleration.Name, Weight: 1},
},
},
Reserve: &schedulerapi.PluginSet{
Enabled: []schedulerapi.Plugin{
{Name: volumebinding.Name},
},
},
PreBind: &schedulerapi.PluginSet{
Enabled: []schedulerapi.Plugin{
{Name: volumebinding.Name},
},
},
Bind: &schedulerapi.PluginSet{
Enabled: []schedulerapi.Plugin{
{Name: defaultbinder.Name},
},
},
}
}

对于每个调度算法,都有相应的方法:

k8s-调度器自定义_调度算法_03

2.1 Predicates----Filter

对一个node来说,如果所有的FilterPlugin的Filter函数返回值都是成功的,这个node才是FeasibleNode,有一个Filter报错,都不可以。

源码解析:

pkg/scheduler/core/generic_scheduler.go
func PodPassesFiltersOnNode(
ctx context.Context,
ph framework.PreemptHandle,
state *framework.CycleState,
pod *v1.Pod,
info *framework.NodeInfo,
) (bool, *framework.Status, error) {
var status *framework.Status

podsAdded := false
// We run filters twice in some cases. If the node has greater or equal priority
// nominated pods, we run them when those pods are added to PreFilter state and nodeInfo.
// If all filters succeed in this pass, we run them again when these
// nominated pods are not added. This second pass is necessary because some
// filters such as inter-pod affinity may not pass without the nominated pods.
// If there are no nominated pods for the node or if the first run of the
// filters fail, we don't run the second pass.
// We consider only equal or higher priority pods in the first pass, because
// those are the current "pod" must yield to them and not take a space opened
// for running them. It is ok if the current "pod" take resources freed for
// lower priority pods.
// Requiring that the new pod is schedulable in both circumstances ensures that
// we are making a conservative decision: filters like resources and inter-pod
// anti-affinity are more likely to fail when the nominated pods are treated
// as running, while filters like pod affinity are more likely to fail when
// the nominated pods are treated as not running. We can't just assume the
// nominated pods are running because they are not running right now and in fact,
// they may end up getting scheduled to a different node.
for i := 0; i < 2; i++ {
stateToUse := state
nodeInfoToUse := info
if i == 0 {
var err error
podsAdded, stateToUse, nodeInfoToUse, err = addNominatedPods(ctx, ph, pod, state, info)
if err != nil {
return false, nil, err
}
} else if !podsAdded || !status.IsSuccess() {
break
}
//这里是都有的Plugin返回值的map集合
statusMap := ph.RunFilterPlugins(ctx, stateToUse, pod, nodeInfoToUse)
status = statusMap.Merge()
//所有的plugin都为Success,才是Success
if !status.IsSuccess() && !status.IsUnschedulable() {
return false, status, status.AsError()
}
}

return status.IsSuccess(), status, nil
}

2.2Priority----Score

每个node都会执行所有的Score插件,由于分数不能超出0-100的范围,所以如果分数过大,使用Nomalize函数来让分数维持在0-100之间。

一个node的分数=每个   ScorePlugin分数*weight   之和。

所有node分数最高的为最终pod要运行的节点。

k8s-调度器自定义_score_04

3.自定义扩展scheduler

通过调度框架(Scheduling Framework)来扩展。

Scheduler Framework定义了一组扩展点,用户可以通过实现扩展点所定义的接口来定制自己的插件,并且将插件注册扩展点。Scheduler Framework在执行调度流程时,运行到相应的扩展点时,将调用用户注册的插件。


3.1扩展点(Extension Points)

这些扩展点都是相应的函数,可以在函数中自定义代码逻辑。

k8s-调度器自定义_score_05

3.2自定义scheduler代码

pkg/plugins/myPlugin.go

这里我实现了Filter,Score,和PreBind

如果宿主机的cpu使用率超过10%就不希望有pod被调度上去了,那就可以在Filter处获取Cpu 的使用率,如果超过相应值,就return err。

如果我们希望宿主机CPU使用率低的分数高,那我们是不是也可以先获取宿主机的Cpu使用率,Score=100-Cpu使用率,那么宿主机Cpu使用率低的宿主机,分数就高了。

package plugins

import (
"context"
"fmt"
v1 "k8s.io/api/core/v1"
"k8s.io/apimachinery/pkg/runtime"
"k8s.io/klog"
pluginhelper "k8s.io/kubernetes/pkg/scheduler/framework/plugins/helper"
framework "k8s.io/kubernetes/pkg/scheduler/framework/v1alpha1"
)


type Args struct {
FavoriteColor string `json:"favorite_color,omitempty"`
FavoriteNumber int `json:"favorite_number,omitempty"`
ThanksTo string `json:"thanks_to,omitempty"`
}

// NodeName is a plugin that checks if a pod spec node name matches the current node.
type MyScheduler struct {
handle framework.FrameworkHandle
}

var _ framework.FilterPlugin = &MyScheduler{}
var _ framework.ScorePlugin = &MyScheduler{}
var _ framework.PreBindPlugin = &MyScheduler{}
const (
// Name is the name of the plugin used in the plugin registry and configurations.
Name = "SamplePlugin"

// ErrReason returned when node name doesn't match.
ErrReason = "node(s) didn't match the requested hostname"
)

// Name returns name of the plugin. It is used in logs, etc.
func (s *MyScheduler) Name() string {
return Name
}

// Filter invoked at the filter extension point.
func (s *MyScheduler) Filter(ctx context.Context, _ *framework.CycleState, pod *v1.Pod, nodeInfo *framework.NodeInfo) *framework.Status {
fmt.Printf("sample-plugin-Filter---pod:%v---nodeInfo:%v\n", pod.Name, nodeInfo.Node().Name)

return nil
}
//-------------------------ScorePlugin /Users/dz0400819/Desktop/mycode/kubernetes/pkg/scheduler/framework/v1alpha1/interface.go
func (s *MyScheduler) Score(ctx context.Context, state *framework.CycleState, pod *v1.Pod, nodeName string) (int64, *framework.Status) {
klog.Infof("---------sample-plugin-Score--pod:%s---nodeName:%s\n", pod.Name, nodeName)
return 10, nil
}

// ScoreExtensions of the Score plugin.
func (s *MyScheduler) ScoreExtensions() framework.ScoreExtensions {
klog.Info("-------ScoreExtensions---------")
return s
}

// NormalizeScore invoked after scoring all nodes.
func (s *MyScheduler) NormalizeScore(ctx context.Context, state *framework.CycleState, pod *v1.Pod, scores framework.NodeScoreList) *framework.Status {
klog.Info("-------NormalizeScore,scores:%v---------",scores)
return pluginhelper.DefaultNormalizeScore(framework.MaxNodeScore, false, scores)
}
//-------------------------ScorePlugin---------------------------------

func (s *MyScheduler) PreBind(ctx context.Context, state *framework.CycleState, pod *v1.Pod, nodeName string) *framework.Status {

if nodeInfo, err := s.handle.SnapshotSharedLister().NodeInfos().Get(nodeName); err != nil {
return framework.NewStatus(framework.Error, fmt.Sprintf("prebind get node info error: %v", nodeName))
} else {
klog.Infof("---------prebind node info: %v--------------", nodeInfo.Node().Name)
return framework.NewStatus(framework.Success, "")
}
}

// New initializes a new plugin and returns it.
func New(_ runtime.Object, f framework.FrameworkHandle) (framework.Plugin, error) {
klog.Infof("-------------------new:")
return &MyScheduler{
handle: f,
}, nil

}

main.go

package main

import (
"fmt"
"github.com/cnych/sample-scheduler-framework/pkg/plugins"
"k8s.io/component-base/logs"
"k8s.io/kubernetes/cmd/kube-scheduler/app"
"math/rand"
"os"
"time"
)

func main() {

rand.Seed(time.Now().UTC().UnixNano())

command := app.NewSchedulerCommand(
app.WithPlugin(plugins.Name, plugins.New),
)

logs.InitLogs()
defer logs.FlushLogs()

if err := command.Execute(); err != nil {
_, _ = fmt.Fprintf(os.Stderr, "%v\n", err)
os.Exit(1)
}

}

还需要一个配置文件deploy/config.yaml

apiVersion: kubescheduler.config.k8s.io/v1beta1
kind: KubeSchedulerConfiguration
clientConnection:
kubeconfig: /Users/dz0400819/.kube/config
#这里的配置一定要加,而且resourceName: sample-scheduler一定要写,
#不写启动不成功,还不好解释,反正是由于领导者选举问题。
leaderElection:
leaderElect: true
resourceName: sample-scheduler
resourceNamespace: kube-system
leaseDuration: 4s
renewDeadline: 3s
profiles:
- schedulerName: sample-scheduler
plugins:
filter:
enabled:
- name: SamplePlugin
score:
enabled:
- name: SamplePlugin
preBind:
enabled:
- name: SamplePlugin

注意:这是一个配置文件,不是crd

这样配置之后,可以调试下源码,可以看到我们的scheduler默认放在了最后的位置。

这里我把k8s源码的scheduler组件在本地启动起来了,哈哈

k8s-调度器自定义_filter_06

goland直接go build

k8s-调度器自定义_filter_07

go.mod

这里关于k8s相关代码,全部使用replace方式才运行起来。这个可以看下k8s源码中的go.mod,关于staging的相关程序,也是replace到源码的staging目录下了。

k8s-调度器自定义_filter_08

/Users/dz0400819/Desktop/mycode/kubernetes这个文件是我下载的k8s源码,切换到了release-1.19分支

如果不使用replace,就报错,大家可以试试是否可以真的不采用replace的方式。

报错信息如下:

/go: k8s.io/kubernetes@v1.20.2 requires
// k8s.io/api@v0.0.0: reading https://goproxy.io/k8s.io/api/@v/v0.0.0.mod: 404 Not Found
// server response: not found: k8s.io/api@v0.0.0: invalid version: unknown revision v0.0.0
module github.com/cnych/sample-scheduler-framework

go 1.13

replace (
k8s.io/api => /Users/dz0400819/Desktop/mycode/kubernetes/staging/src/k8s.io/api
k8s.io/apiextensions-apiserver => /Users/dz0400819/Desktop/mycode/kubernetes/staging/src/k8s.io/apiextensions-apiserver
k8s.io/apimachinery => /Users/dz0400819/Desktop/mycode/kubernetes/staging/src/k8s.io/apimachinery
k8s.io/apiserver => /Users/dz0400819/Desktop/mycode/kubernetes/staging/src/k8s.io/apiserver
k8s.io/cli-runtime => /Users/dz0400819/Desktop/mycode/kubernetes/staging/src/k8s.io/cli-runtime
k8s.io/client-go => /Users/dz0400819/Desktop/mycode/kubernetes/staging/src/k8s.io/client-go
k8s.io/cloud-provider => /Users/dz0400819/Desktop/mycode/kubernetes/staging/src/k8s.io/cloud-provider
k8s.io/cluster-bootstrap => /Users/dz0400819/Desktop/mycode/kubernetes/staging/src/k8s.io/cluster-bootstrap
k8s.io/code-generator => /Users/dz0400819/Desktop/mycode/kubernetes/staging/src/k8s.io/code-generator
k8s.io/component-base => /Users/dz0400819/Desktop/mycode/kubernetes/staging/src/k8s.io/component-base
k8s.io/cri-api => /Users/dz0400819/Desktop/mycode/kubernetes/staging/src/k8s.io/cri-api
k8s.io/csi-translation-lib => /Users/dz0400819/Desktop/mycode/kubernetes/staging/src/k8s.io/csi-translation-lib
k8s.io/kube-aggregator => /Users/dz0400819/Desktop/mycode/kubernetes/staging/src/k8s.io/kube-aggregator
k8s.io/kube-controller-manager => /Users/dz0400819/Desktop/mycode/kubernetes/staging/src/k8s.io/kube-controller-manager
k8s.io/kube-proxy => /Users/dz0400819/Desktop/mycode/kubernetes/staging/src/k8s.io/kube-proxy
k8s.io/kube-scheduler => /Users/dz0400819/Desktop/mycode/kubernetes/staging/src/k8s.io/kube-scheduler
k8s.io/kubectl => /Users/dz0400819/Desktop/mycode/kubernetes/staging/src/k8s.io/kubectl
k8s.io/kubelet => /Users/dz0400819/Desktop/mycode/kubernetes/staging/src/k8s.io/kubelet
k8s.io/kubernetes => /Users/dz0400819/Desktop/mycode/kubernetes
k8s.io/legacy-cloud-providers => /Users/dz0400819/Desktop/mycode/kubernetes/staging/src/k8s.io/legacy-cloud-providers
k8s.io/metrics => /Users/dz0400819/Desktop/mycode/kubernetes/staging/src/k8s.io/metrics
k8s.io/sample-apiserver => /Users/dz0400819/Desktop/mycode/kubernetes/staging/src/k8s.io/sample-apiserver
)

require (
k8s.io/api v0.0.0
k8s.io/apimachinery v0.0.0
k8s.io/component-base v0.0.0
k8s.io/klog v1.0.0
k8s.io/kubernetes v0.0.0-00010101000000-000000000000
)

启动

k8s-调度器自定义_score_09


看到successfully acquired lease kube-system/sample-scheduler才说明你的scheduler启动成功!

如果看不到successfully,可以把--v=3日志级别数字调大,就能看到更多报错信息。

I0214 17:38:06.816900   48491 eventhandlers.go:225] add event for scheduled pod monitor/node-exporter-794pv 
I0214 17:38:06.816911 48491 eventhandlers.go:225] add event for scheduled pod ops/haproxy-74f9f967d5-kdpx9
I0214 17:38:06.816920 48491 eventhandlers.go:225] add event for scheduled pod flink/flink-taskmanager-7dd6dd7b54-dtchl
I0214 17:38:06.816927 48491 eventhandlers.go:225] add event for scheduled pod arena-system/mpi-operator-76747d4598-tbv55
I0214 17:38:06.816938 48491 eventhandlers.go:225] add event for scheduled pod kube-system/ack-virtual-node-affinity-admission-controller-d4cb5c54-bx97f
I0214 17:38:06.816950 48491 eventhandlers.go:225] add event for scheduled pod monitor/metrics-65bd5556bc-fkj7b
####这里是关于领导者选举的信息,如果上面的配置中不配置resourceName: sample-scheduler
####把--v=4,就会看到相关报错信息。
I0214 17:38:06.838996 48491 leaderelection.go:243] attempting to acquire leader lease kube-system/sample-scheduler...
I0214 17:38:12.690399 48491 leaderelection.go:253] successfully acquired lease kube-system/sample-scheduler

创建deployment

使用我们自定义的scheduler

schedulerName: sample-scheduler

如果还想使用默认的调度器,可以写schedulerName: default-scheduler,或者不指定scheduler。

apiVersion: apps/v1
kind: Deployment
metadata:
name: test-scheduler
spec:
replicas: 1
selector:
matchLabels:
app: test-scheduler
template:
metadata:
labels:
app: test-scheduler
spec:
schedulerName: sample-scheduler
containers:
- image: nginx
imagePullPolicy: IfNotPresent
name: nginx
ports:
- containerPort: 80

创建一个deployment,可以看到我们自定义的函数已经执行了!而且是每个node都会打印出相关日志。

I0214 17:38:06.816900   48491 eventhandlers.go:225] add event for scheduled pod monitor/node-exporter-794pv 
I0214 17:38:06.816911 48491 eventhandlers.go:225] add event for scheduled pod ops/haproxy-74f9f967d5-kdpx9
I0214 17:38:06.816920 48491 eventhandlers.go:225] add event for scheduled pod flink/flink-taskmanager-7dd6dd7b54-dtchl
I0214 17:38:06.816927 48491 eventhandlers.go:225] add event for scheduled pod arena-system/mpi-operator-76747d4598-tbv55
I0214 17:38:06.816938 48491 eventhandlers.go:225] add event for scheduled pod kube-system/ack-virtual-node-affinity-admission-controller-d4cb5c54-bx97f
I0214 17:38:06.816950 48491 eventhandlers.go:225] add event for scheduled pod monitor/metrics-65bd5556bc-fkj7b
####这里是关于领导者选举的信息,如果上面的配置中不配置resourceName: sample-scheduler
####把--v=4,就会看到相关报错信息。
I0214 17:38:06.838996 48491 leaderelection.go:243] attempting to acquire leader lease kube-system/sample-scheduler...
I0214 17:38:12.690399 48491 leaderelection.go:253] successfully acquired lease kube-system/sample-scheduler
####这里是函数执行日志
I0214 17:38:12.694611 48491 myPlugin.go:49] ---------sample-plugin-Score--pod:test-scheduler-5659b79dc9-kdrkl---nodeName:cn-hangzhou.i-bp12cajtkwm3sz18zb56
I0214 17:38:12.694612 48491 myPlugin.go:49] ---------sample-plugin-Score--pod:test-scheduler-5659b79dc9-kdrkl---nodeName:cn-hangzhou.i-bp12cajtkwm3sz18zb53
I0214 17:38:12.694611 48491 myPlugin.go:49] ---------sample-plugin-Score--pod:test-scheduler-5659b79dc9-kdrkl---nodeName:cn-hangzhou.i-bp12cajtkwm3sz18zb51
I0214 17:38:12.694624 48491 myPlugin.go:49] ---------sample-plugin-Score--pod:test-scheduler-5659b79dc9-kdrkl---nodeName:cn-hangzhou.i-bp12cajtkwm3sz18zb55
I0214 17:38:12.694899 48491 myPlugin.go:55] -------ScoreExtensions---------
I0214 17:38:12.694936 48491 myPlugin.go:55] -------ScoreExtensions---------
I0214 17:38:12.694954 48491 myPlugin.go:61] -------NormalizeScore,scores:%v---------[{cn-hangzhou.i-bp12cajtkwm3sz18zb56 10} {cn-hangzhou.i-bp12cajtkwm3sz18zb53 10} {cn-hangzhou.i-bp12cajtkwm3sz18zb55 10} {cn-hangzhou.i-bp12cajtkwm3sz18zb51 10}]
I0214 17:38:12.695880 48491 myPlugin.go:71] ---------prebind node info: cn-hangzhou.i-bp12cajtkwm3sz18zb53--------------
I0214 17:38:12.695938 48491 default_binder.go:51] Attempting to bind default/test-scheduler-5659b79dc9-kdrkl to cn-hangzhou.i-bp12cajtkwm3sz18zb53
I0214 17:38:12.733102 48491 eventhandlers.go:225] add event for scheduled pod default/test-scheduler-5659b79dc9-kdrkl
I0214 17:38:12.733101 48491 eventhandlers.go:205] delete event for unscheduled pod default/test-scheduler-5659b79dc9-kdrkl
I0214 17:38:12.733955 48491 scheduler.go:597] "Successfully bound pod to node" pod="default/test-scheduler-5659b79dc9-kdrkl" node="cn-hangzhou.i-bp12cajtkwm3sz18zb53" evaluatedNodes=7 feasibleNodes=4

3.3调整Filter插件执行顺序

默认runAllFilters=false,不执行所有的filter,一个filter报错,就返回,不再执行后面的filter了。

源码解析如下:

// RunFilterPlugins runs the set of configured Filter plugins for pod on
// the given node. If any of these plugins doesn't return "Success", the
// given node is not suitable for running pod.
// Meanwhile, the failure message and status are set for the given node.
func (f *frameworkImpl) RunFilterPlugins(
ctx context.Context,
state *framework.CycleState,
pod *v1.Pod,
nodeInfo *framework.NodeInfo,
) framework.PluginToStatus {
statuses := make(framework.PluginToStatus)
for _, pl := range f.filterPlugins {
pluginStatus := f.runFilterPlugin(ctx, pl, state, pod, nodeInfo)
if !pluginStatus.IsSuccess() {
if !pluginStatus.IsUnschedulable() {
// Filter plugins are not supposed to return any status other than
// Success or Unschedulable.
errStatus := framework.NewStatus(framework.Error, fmt.Sprintf("running %q filter plugin for pod %q: %v", pl.Name(), pod.Name, pluginStatus.Message()))
return map[string]*framework.Status{pl.Name(): errStatus}
}
statuses[pl.Name()] = pluginStatus
// 默认runAllFilters=false,不执行所有的filter,一个filter报错,就返回,
if !f.runAllFilters {
// Exit early if we don't need to run all filters.
return statuses
}
}
}

return statuses
}

如果想调整插件顺序,可以使用如下方式修改配置文件

禁用所有的默认filter插件,只使用两个Filter插件


apiVersion: kubescheduler.config.k8s.io/v1beta1
kind: KubeSchedulerConfiguration
clientConnection:
kubeconfig: /Users/dz0400819/.kube/config
leaderElection:
leaderElect: true
resourceName: sample-scheduler
resourceNamespace: kube-system
leaseDuration: 4s
renewDeadline: 3s
profiles:
- schedulerName: sample-scheduler
plugins:
filter:
disabled:
- name: '*'
enabled:
- name: SamplePlugin
- name: NodeResourcesFit
score:
enabled:
- name: SamplePlugin
preBind:
enabled:
- name: SamplePlugin

debug源码验证

可以看到FilterPlugin只有两个了,而且顺序和我们配置的顺序相同。


k8s-调度器自定义_score_10

其他的plugin不受影响,查看下ScorePlugin

k8s-调度器自定义_score_11

4.生产部署

kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: sample-scheduler-clusterrole
rules:
- apiGroups:
- ""
resources:
- endpoints
- events
verbs:
- create
- get
- update
- apiGroups:
- ""
resources:
- nodes
verbs:
- get
- list
- watch
- apiGroups:
- ""
resources:
- pods
verbs:
- delete
- get
- list
- watch
- update
- apiGroups:
- ""
resources:
- bindings
- pods/binding
verbs:
- create
- apiGroups:
- ""
resources:
- pods/status
verbs:
- patch
- update
- apiGroups:
- ""
resources:
- replicationcontrollers
- services
verbs:
- get
- list
- watch
- apiGroups:
- apps
- extensions
resources:
- replicasets
verbs:
- get
- list
- watch
- apiGroups:
- apps
resources:
- statefulsets
verbs:
- get
- list
- watch
- apiGroups:
- policy
resources:
- poddisruptionbudgets
verbs:
- get
- list
- watch
- apiGroups:
- ""
resources:
- persistentvolumeclaims
- persistentvolumes
verbs:
- get
- list
- watch
- apiGroups:
- ""
resources:
- configmaps
verbs:
- get
- list
- watch
- apiGroups:
- "storage.k8s.io"
resources:
- storageclasses
- csinodes
verbs:
- get
- list
- watch
- apiGroups:
- "coordination.k8s.io"
resources:
- leases
verbs:
- create
- get
- list
- update
- apiGroups:
- "events.k8s.io"
resources:
- events
verbs:
- create
- patch
- update
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: sample-scheduler-sa
namespace: kube-system
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: sample-scheduler-clusterrolebinding
namespace: kube-system
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: sample-scheduler-clusterrole
subjects:
- kind: ServiceAccount
name: sample-scheduler-sa
namespace: kube-system

---
apiVersion: v1
kind: ConfigMap
metadata:
name: scheduler-config
namespace: kube-system
data:
scheduler-config.yaml: |
apiVersion: kubescheduler.config.k8s.io/v1beta1
kind: KubeSchedulerConfiguration
profiles:
- schedulerName: sample-scheduler
plugins:
score:
enabled:
- name: "sample-plugin"
filter:
enabled:
- name: "sample-plugin"
preBind:
enabled:
- name: "sample-plugin"
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: sample-scheduler
namespace: kube-system
labels:
component: sample-scheduler
spec:
replicas: 1
selector:
matchLabels:
component: sample-scheduler
template:
metadata:
labels:
component: sample-scheduler
spec:
serviceAccount: sample-scheduler-sa
priorityClassName: system-cluster-critical
volumes:
- name: scheduler-config
configMap:
name: scheduler-config
containers:
- name: scheduler-ctrl
image: registry.cn-hangzhou.aliyuncs.com/vita_lili/lili:sample-sheduler-v1
imagePullPolicy: IfNotPresent
args:
- sample-scheduler-framework
- --config=/etc/kubernetes/scheduler-config.yaml
- --v=3
resources:
requests:
cpu: "50m"
volumeMounts:
- name: scheduler-config
mountPath: /etc/kubernetes

5.参考文献

更多细节可以查看下面的文档。

这里主要讲的是k8s1.16版本的自定义scheduler,1.18版本和其理念相同,部分字段名,传参数不同!

​​https://www.qikqiak.com/post/custom-kube-scheduler/​​

官网中的scheduler扩展介绍

​​https://kubernetes.io/zh/docs/reference/scheduling/config/#scheduling-plugins​​


# 领导者选举介绍

​​https://zhengyinyong.com/post/kubernetes-pod-leader-election/​​


KubeSchedulerConfiguration

k8s.io/kube-scheduler/config/v1beta1/types.go


阿里推荐

​​https://developer.aliyun.com/article/756016​​

举报

相关推荐

0 条评论