本文主要讲解k8s1.18版本,这里主要是测试验证,实际阿里云不推荐自己另外写一个调度器。
1.scheduler
1.1介绍
在 Kubernetes 项目中,默认调度器的主要职责,就是为一个新创建出来的 Pod,寻找一个最合适的节点(Node)。
而这里“最合适”的含义,包括两层:
1.从集群所有的节点中,根据调度算法(Filter函数)挑选出所有可以运行该 Pod 的节点;
2.从第一步的结果中,再根据调度算法(Score函数)挑选一个最符合条件的节点作为最终结果。
所以在具体的调度流程中,默认调度器会首先调用一组叫作 Predicate 的调度算法,来检查每个 Node。然后,再调用一组叫作 Priority 的调度算法,来给上一步得到的结果里的每个 Node 打分。
最终的调度结果,就是得分最高的那个 Node。调度器对一个 Pod 调度成功,实际上就是将它的 spec.nodeName 字段填上调度结果的节点名字。
1.2源码解析
可以先只重点关注下带色字体,尤其是粉色和黄色哦,这是大框架,重点。黑色字体是部分细节。
2.调度算法
默认的调度算法如下
在执行调度算法的时候,是所有node同时执行调度算法,对于Filter来说,每个 node都会循环所有的filter,得到可以运行pod的node列表。
func getDefaultConfig() *schedulerapi.Plugins {
return &schedulerapi.Plugins{
QueueSort: &schedulerapi.PluginSet{
Enabled: []schedulerapi.Plugin{
{Name: queuesort.Name},
},
},
PreFilter: &schedulerapi.PluginSet{
Enabled: []schedulerapi.Plugin{
{Name: noderesources.FitName},
{Name: nodeports.Name},
{Name: podtopologyspread.Name},
{Name: interpodaffinity.Name},
{Name: volumebinding.Name},
},
},
Filter: &schedulerapi.PluginSet{
Enabled: []schedulerapi.Plugin{
{Name: nodeunschedulable.Name},
{Name: noderesources.FitName},
{Name: nodename.Name},
{Name: nodeports.Name},
{Name: nodeaffinity.Name},
{Name: volumerestrictions.Name},
{Name: tainttoleration.Name},
{Name: nodevolumelimits.EBSName},
{Name: nodevolumelimits.GCEPDName},
{Name: nodevolumelimits.CSIName},
{Name: nodevolumelimits.AzureDiskName},
{Name: volumebinding.Name},
{Name: volumezone.Name},
{Name: podtopologyspread.Name},
{Name: interpodaffinity.Name},
},
},
PostFilter: &schedulerapi.PluginSet{
Enabled: []schedulerapi.Plugin{
{Name: defaultpreemption.Name},
},
},
PreScore: &schedulerapi.PluginSet{
Enabled: []schedulerapi.Plugin{
{Name: interpodaffinity.Name},
{Name: podtopologyspread.Name},
{Name: tainttoleration.Name},
},
},
Score: &schedulerapi.PluginSet{
Enabled: []schedulerapi.Plugin{
{Name: noderesources.BalancedAllocationName, Weight: 1},
{Name: imagelocality.Name, Weight: 1},
{Name: interpodaffinity.Name, Weight: 1},
{Name: noderesources.LeastAllocatedName, Weight: 1},
{Name: nodeaffinity.Name, Weight: 1},
{Name: nodepreferavoidpods.Name, Weight: 10000},
// Weight is doubled because:
// - This is a score coming from user preference.
// - It makes its signal comparable to NodeResourcesLeastAllocated.
{Name: podtopologyspread.Name, Weight: 2},
{Name: tainttoleration.Name, Weight: 1},
},
},
Reserve: &schedulerapi.PluginSet{
Enabled: []schedulerapi.Plugin{
{Name: volumebinding.Name},
},
},
PreBind: &schedulerapi.PluginSet{
Enabled: []schedulerapi.Plugin{
{Name: volumebinding.Name},
},
},
Bind: &schedulerapi.PluginSet{
Enabled: []schedulerapi.Plugin{
{Name: defaultbinder.Name},
},
},
}
}
对于每个调度算法,都有相应的方法:
2.1 Predicates----Filter
对一个node来说,如果所有的FilterPlugin的Filter函数返回值都是成功的,这个node才是FeasibleNode,有一个Filter报错,都不可以。
源码解析:
pkg/scheduler/core/generic_scheduler.go
func PodPassesFiltersOnNode(
ctx context.Context,
ph framework.PreemptHandle,
state *framework.CycleState,
pod *v1.Pod,
info *framework.NodeInfo,
) (bool, *framework.Status, error) {
var status *framework.Status
podsAdded := false
// We run filters twice in some cases. If the node has greater or equal priority
// nominated pods, we run them when those pods are added to PreFilter state and nodeInfo.
// If all filters succeed in this pass, we run them again when these
// nominated pods are not added. This second pass is necessary because some
// filters such as inter-pod affinity may not pass without the nominated pods.
// If there are no nominated pods for the node or if the first run of the
// filters fail, we don't run the second pass.
// We consider only equal or higher priority pods in the first pass, because
// those are the current "pod" must yield to them and not take a space opened
// for running them. It is ok if the current "pod" take resources freed for
// lower priority pods.
// Requiring that the new pod is schedulable in both circumstances ensures that
// we are making a conservative decision: filters like resources and inter-pod
// anti-affinity are more likely to fail when the nominated pods are treated
// as running, while filters like pod affinity are more likely to fail when
// the nominated pods are treated as not running. We can't just assume the
// nominated pods are running because they are not running right now and in fact,
// they may end up getting scheduled to a different node.
for i := 0; i < 2; i++ {
stateToUse := state
nodeInfoToUse := info
if i == 0 {
var err error
podsAdded, stateToUse, nodeInfoToUse, err = addNominatedPods(ctx, ph, pod, state, info)
if err != nil {
return false, nil, err
}
} else if !podsAdded || !status.IsSuccess() {
break
}
//这里是都有的Plugin返回值的map集合
statusMap := ph.RunFilterPlugins(ctx, stateToUse, pod, nodeInfoToUse)
status = statusMap.Merge()
//所有的plugin都为Success,才是Success
if !status.IsSuccess() && !status.IsUnschedulable() {
return false, status, status.AsError()
}
}
return status.IsSuccess(), status, nil
}
2.2Priority----Score
每个node都会执行所有的Score插件,由于分数不能超出0-100的范围,所以如果分数过大,使用Nomalize函数来让分数维持在0-100之间。
一个node的分数=每个 ScorePlugin分数*weight 之和。
所有node分数最高的为最终pod要运行的节点。
3.自定义扩展scheduler
通过调度框架(Scheduling Framework)来扩展。
Scheduler Framework定义了一组扩展点,用户可以通过实现扩展点所定义的接口来定制自己的插件,并且将插件注册扩展点。Scheduler Framework在执行调度流程时,运行到相应的扩展点时,将调用用户注册的插件。
3.1扩展点(Extension Points)
这些扩展点都是相应的函数,可以在函数中自定义代码逻辑。
3.2自定义scheduler代码
pkg/plugins/myPlugin.go
这里我实现了Filter,Score,和PreBind
如果宿主机的cpu使用率超过10%就不希望有pod被调度上去了,那就可以在Filter处获取Cpu 的使用率,如果超过相应值,就return err。
如果我们希望宿主机CPU使用率低的分数高,那我们是不是也可以先获取宿主机的Cpu使用率,Score=100-Cpu使用率,那么宿主机Cpu使用率低的宿主机,分数就高了。
package plugins
import (
"context"
"fmt"
v1 "k8s.io/api/core/v1"
"k8s.io/apimachinery/pkg/runtime"
"k8s.io/klog"
pluginhelper "k8s.io/kubernetes/pkg/scheduler/framework/plugins/helper"
framework "k8s.io/kubernetes/pkg/scheduler/framework/v1alpha1"
)
type Args struct {
FavoriteColor string `json:"favorite_color,omitempty"`
FavoriteNumber int `json:"favorite_number,omitempty"`
ThanksTo string `json:"thanks_to,omitempty"`
}
// NodeName is a plugin that checks if a pod spec node name matches the current node.
type MyScheduler struct {
handle framework.FrameworkHandle
}
var _ framework.FilterPlugin = &MyScheduler{}
var _ framework.ScorePlugin = &MyScheduler{}
var _ framework.PreBindPlugin = &MyScheduler{}
const (
// Name is the name of the plugin used in the plugin registry and configurations.
Name = "SamplePlugin"
// ErrReason returned when node name doesn't match.
ErrReason = "node(s) didn't match the requested hostname"
)
// Name returns name of the plugin. It is used in logs, etc.
func (s *MyScheduler) Name() string {
return Name
}
// Filter invoked at the filter extension point.
func (s *MyScheduler) Filter(ctx context.Context, _ *framework.CycleState, pod *v1.Pod, nodeInfo *framework.NodeInfo) *framework.Status {
fmt.Printf("sample-plugin-Filter---pod:%v---nodeInfo:%v\n", pod.Name, nodeInfo.Node().Name)
return nil
}
//-------------------------ScorePlugin /Users/dz0400819/Desktop/mycode/kubernetes/pkg/scheduler/framework/v1alpha1/interface.go
func (s *MyScheduler) Score(ctx context.Context, state *framework.CycleState, pod *v1.Pod, nodeName string) (int64, *framework.Status) {
klog.Infof("---------sample-plugin-Score--pod:%s---nodeName:%s\n", pod.Name, nodeName)
return 10, nil
}
// ScoreExtensions of the Score plugin.
func (s *MyScheduler) ScoreExtensions() framework.ScoreExtensions {
klog.Info("-------ScoreExtensions---------")
return s
}
// NormalizeScore invoked after scoring all nodes.
func (s *MyScheduler) NormalizeScore(ctx context.Context, state *framework.CycleState, pod *v1.Pod, scores framework.NodeScoreList) *framework.Status {
klog.Info("-------NormalizeScore,scores:%v---------",scores)
return pluginhelper.DefaultNormalizeScore(framework.MaxNodeScore, false, scores)
}
//-------------------------ScorePlugin---------------------------------
func (s *MyScheduler) PreBind(ctx context.Context, state *framework.CycleState, pod *v1.Pod, nodeName string) *framework.Status {
if nodeInfo, err := s.handle.SnapshotSharedLister().NodeInfos().Get(nodeName); err != nil {
return framework.NewStatus(framework.Error, fmt.Sprintf("prebind get node info error: %v", nodeName))
} else {
klog.Infof("---------prebind node info: %v--------------", nodeInfo.Node().Name)
return framework.NewStatus(framework.Success, "")
}
}
// New initializes a new plugin and returns it.
func New(_ runtime.Object, f framework.FrameworkHandle) (framework.Plugin, error) {
klog.Infof("-------------------new:")
return &MyScheduler{
handle: f,
}, nil
}
main.go
package main
import (
"fmt"
"github.com/cnych/sample-scheduler-framework/pkg/plugins"
"k8s.io/component-base/logs"
"k8s.io/kubernetes/cmd/kube-scheduler/app"
"math/rand"
"os"
"time"
)
func main() {
rand.Seed(time.Now().UTC().UnixNano())
command := app.NewSchedulerCommand(
app.WithPlugin(plugins.Name, plugins.New),
)
logs.InitLogs()
defer logs.FlushLogs()
if err := command.Execute(); err != nil {
_, _ = fmt.Fprintf(os.Stderr, "%v\n", err)
os.Exit(1)
}
}
还需要一个配置文件deploy/config.yaml
apiVersion kubescheduler.config.k8s.io/v1beta1
kind KubeSchedulerConfiguration
clientConnection
kubeconfig /Users/dz0400819/.kube/config
#这里的配置一定要加,而且resourceName: sample-scheduler一定要写,
#不写启动不成功,还不好解释,反正是由于领导者选举问题。
leaderElection
leaderElecttrue
resourceName sample-scheduler
resourceNamespace kube-system
leaseDuration 4s
renewDeadline 3s
profiles
schedulerName sample-scheduler
plugins
filter
enabled
name SamplePlugin
score
enabled
name SamplePlugin
preBind
enabled
name SamplePlugin
注意:这是一个配置文件,不是crd
这样配置之后,可以调试下源码,可以看到我们的scheduler默认放在了最后的位置。
这里我把k8s源码的scheduler组件在本地启动起来了,哈哈
goland直接go build
go.mod
这里关于k8s相关代码,全部使用replace方式才运行起来。这个可以看下k8s源码中的go.mod,关于staging的相关程序,也是replace到源码的staging目录下了。
/Users/dz0400819/Desktop/mycode/kubernetes这个文件是我下载的k8s源码,切换到了release-1.19分支
如果不使用replace,就报错,大家可以试试是否可以真的不采用replace的方式。
报错信息如下:
/go: k8s.io/kubernetes@v1.20.2 requires
// k8s.io/api@v0.0.0: reading https://goproxy.io/k8s.io/api/@v/v0.0.0.mod: 404 Not Found
// server response: not found: k8s.io/api@v0.0.0: invalid version: unknown revision v0.0.0
module github.com/cnych/sample-scheduler-framework
go 1.13
replace (
k8s.io/api => /Users/dz0400819/Desktop/mycode/kubernetes/staging/src/k8s.io/api
k8s.io/apiextensions-apiserver => /Users/dz0400819/Desktop/mycode/kubernetes/staging/src/k8s.io/apiextensions-apiserver
k8s.io/apimachinery => /Users/dz0400819/Desktop/mycode/kubernetes/staging/src/k8s.io/apimachinery
k8s.io/apiserver => /Users/dz0400819/Desktop/mycode/kubernetes/staging/src/k8s.io/apiserver
k8s.io/cli-runtime => /Users/dz0400819/Desktop/mycode/kubernetes/staging/src/k8s.io/cli-runtime
k8s.io/client-go => /Users/dz0400819/Desktop/mycode/kubernetes/staging/src/k8s.io/client-go
k8s.io/cloud-provider => /Users/dz0400819/Desktop/mycode/kubernetes/staging/src/k8s.io/cloud-provider
k8s.io/cluster-bootstrap => /Users/dz0400819/Desktop/mycode/kubernetes/staging/src/k8s.io/cluster-bootstrap
k8s.io/code-generator => /Users/dz0400819/Desktop/mycode/kubernetes/staging/src/k8s.io/code-generator
k8s.io/component-base => /Users/dz0400819/Desktop/mycode/kubernetes/staging/src/k8s.io/component-base
k8s.io/cri-api => /Users/dz0400819/Desktop/mycode/kubernetes/staging/src/k8s.io/cri-api
k8s.io/csi-translation-lib => /Users/dz0400819/Desktop/mycode/kubernetes/staging/src/k8s.io/csi-translation-lib
k8s.io/kube-aggregator => /Users/dz0400819/Desktop/mycode/kubernetes/staging/src/k8s.io/kube-aggregator
k8s.io/kube-controller-manager => /Users/dz0400819/Desktop/mycode/kubernetes/staging/src/k8s.io/kube-controller-manager
k8s.io/kube-proxy => /Users/dz0400819/Desktop/mycode/kubernetes/staging/src/k8s.io/kube-proxy
k8s.io/kube-scheduler => /Users/dz0400819/Desktop/mycode/kubernetes/staging/src/k8s.io/kube-scheduler
k8s.io/kubectl => /Users/dz0400819/Desktop/mycode/kubernetes/staging/src/k8s.io/kubectl
k8s.io/kubelet => /Users/dz0400819/Desktop/mycode/kubernetes/staging/src/k8s.io/kubelet
k8s.io/kubernetes => /Users/dz0400819/Desktop/mycode/kubernetes
k8s.io/legacy-cloud-providers => /Users/dz0400819/Desktop/mycode/kubernetes/staging/src/k8s.io/legacy-cloud-providers
k8s.io/metrics => /Users/dz0400819/Desktop/mycode/kubernetes/staging/src/k8s.io/metrics
k8s.io/sample-apiserver => /Users/dz0400819/Desktop/mycode/kubernetes/staging/src/k8s.io/sample-apiserver
)
require (
k8s.io/api v0.0.0
k8s.io/apimachinery v0.0.0
k8s.io/component-base v0.0.0
k8s.io/klog v1.0.0
k8s.io/kubernetes v0.0.0-00010101000000-000000000000
)
启动
看到successfully acquired lease kube-system/sample-scheduler才说明你的scheduler启动成功!
如果看不到successfully,可以把--v=3日志级别数字调大,就能看到更多报错信息。
I0214 17:38:06.816900 48491 eventhandlers.go:225] add event for scheduled pod monitor/node-exporter-794pv
I0214 17:38:06.816911 48491 eventhandlers.go:225] add event for scheduled pod ops/haproxy-74f9f967d5-kdpx9
I0214 17:38:06.816920 48491 eventhandlers.go:225] add event for scheduled pod flink/flink-taskmanager-7dd6dd7b54-dtchl
I0214 17:38:06.816927 48491 eventhandlers.go:225] add event for scheduled pod arena-system/mpi-operator-76747d4598-tbv55
I0214 17:38:06.816938 48491 eventhandlers.go:225] add event for scheduled pod kube-system/ack-virtual-node-affinity-admission-controller-d4cb5c54-bx97f
I0214 17:38:06.816950 48491 eventhandlers.go:225] add event for scheduled pod monitor/metrics-65bd5556bc-fkj7b
####这里是关于领导者选举的信息,如果上面的配置中不配置resourceName: sample-scheduler
####把--v=4,就会看到相关报错信息。
I0214 17:38:06.838996 48491 leaderelection.go:243] attempting to acquire leader lease kube-system/sample-scheduler...
I0214 17:38:12.690399 48491 leaderelection.go:253] successfully acquired lease kube-system/sample-scheduler
创建deployment
使用我们自定义的scheduler
schedulerName: sample-scheduler
如果还想使用默认的调度器,可以写schedulerName: default-scheduler,或者不指定scheduler。
apiVersion apps/v1
kind Deployment
metadata
name test-scheduler
spec
replicas1
selector
matchLabels
app test-scheduler
template
metadata
labels
app test-scheduler
spec
schedulerName sample-scheduler
containers
image nginx
imagePullPolicy IfNotPresent
name nginx
ports
containerPort80
创建一个deployment,可以看到我们自定义的函数已经执行了!而且是每个node都会打印出相关日志。
I0214 17:38:06.816900 48491 eventhandlers.go:225] add event for scheduled pod monitor/node-exporter-794pv
I0214 17:38:06.816911 48491 eventhandlers.go:225] add event for scheduled pod ops/haproxy-74f9f967d5-kdpx9
I0214 17:38:06.816920 48491 eventhandlers.go:225] add event for scheduled pod flink/flink-taskmanager-7dd6dd7b54-dtchl
I0214 17:38:06.816927 48491 eventhandlers.go:225] add event for scheduled pod arena-system/mpi-operator-76747d4598-tbv55
I0214 17:38:06.816938 48491 eventhandlers.go:225] add event for scheduled pod kube-system/ack-virtual-node-affinity-admission-controller-d4cb5c54-bx97f
I0214 17:38:06.816950 48491 eventhandlers.go:225] add event for scheduled pod monitor/metrics-65bd5556bc-fkj7b
####这里是关于领导者选举的信息,如果上面的配置中不配置resourceName: sample-scheduler
####把--v=4,就会看到相关报错信息。
I0214 17:38:06.838996 48491 leaderelection.go:243] attempting to acquire leader lease kube-system/sample-scheduler...
I0214 17:38:12.690399 48491 leaderelection.go:253] successfully acquired lease kube-system/sample-scheduler
####这里是函数执行日志
I0214 17:38:12.694611 48491 myPlugin.go:49] ---------sample-plugin-Score--pod:test-scheduler-5659b79dc9-kdrkl---nodeName:cn-hangzhou.i-bp12cajtkwm3sz18zb56
I0214 17:38:12.694612 48491 myPlugin.go:49] ---------sample-plugin-Score--pod:test-scheduler-5659b79dc9-kdrkl---nodeName:cn-hangzhou.i-bp12cajtkwm3sz18zb53
I0214 17:38:12.694611 48491 myPlugin.go:49] ---------sample-plugin-Score--pod:test-scheduler-5659b79dc9-kdrkl---nodeName:cn-hangzhou.i-bp12cajtkwm3sz18zb51
I0214 17:38:12.694624 48491 myPlugin.go:49] ---------sample-plugin-Score--pod:test-scheduler-5659b79dc9-kdrkl---nodeName:cn-hangzhou.i-bp12cajtkwm3sz18zb55
I0214 17:38:12.694899 48491 myPlugin.go:55] -------ScoreExtensions---------
I0214 17:38:12.694936 48491 myPlugin.go:55] -------ScoreExtensions---------
I0214 17:38:12.694954 48491 myPlugin.go:61] -------NormalizeScore,scores:%v---------[{cn-hangzhou.i-bp12cajtkwm3sz18zb56 10} {cn-hangzhou.i-bp12cajtkwm3sz18zb53 10} {cn-hangzhou.i-bp12cajtkwm3sz18zb55 10} {cn-hangzhou.i-bp12cajtkwm3sz18zb51 10}]
I0214 17:38:12.695880 48491 myPlugin.go:71] ---------prebind node info: cn-hangzhou.i-bp12cajtkwm3sz18zb53--------------
I0214 17:38:12.695938 48491 default_binder.go:51] Attempting to bind default/test-scheduler-5659b79dc9-kdrkl to cn-hangzhou.i-bp12cajtkwm3sz18zb53
I0214 17:38:12.733102 48491 eventhandlers.go:225] add event for scheduled pod default/test-scheduler-5659b79dc9-kdrkl
I0214 17:38:12.733101 48491 eventhandlers.go:205] delete event for unscheduled pod default/test-scheduler-5659b79dc9-kdrkl
I0214 17:38:12.733955 48491 scheduler.go:597] "Successfully bound pod to node" pod="default/test-scheduler-5659b79dc9-kdrkl" node="cn-hangzhou.i-bp12cajtkwm3sz18zb53" evaluatedNodes=7 feasibleNodes=4
3.3调整Filter插件执行顺序
默认runAllFilters=false,不执行所有的filter,一个filter报错,就返回,不再执行后面的filter了。
源码解析如下:
// RunFilterPlugins runs the set of configured Filter plugins for pod on
// the given node. If any of these plugins doesn't return "Success", the
// given node is not suitable for running pod.
// Meanwhile, the failure message and status are set for the given node.
func (f *frameworkImpl) RunFilterPlugins(
ctx context.Context,
state *framework.CycleState,
pod *v1.Pod,
nodeInfo *framework.NodeInfo,
) framework.PluginToStatus {
statuses := make(framework.PluginToStatus)
for _, pl := range f.filterPlugins {
pluginStatus := f.runFilterPlugin(ctx, pl, state, pod, nodeInfo)
if !pluginStatus.IsSuccess() {
if !pluginStatus.IsUnschedulable() {
// Filter plugins are not supposed to return any status other than
// Success or Unschedulable.
errStatus := framework.NewStatus(framework.Error, fmt.Sprintf("running %q filter plugin for pod %q: %v", pl.Name(), pod.Name, pluginStatus.Message()))
return map[string]*framework.Status{pl.Name(): errStatus}
}
statuses[pl.Name()] = pluginStatus
// 默认runAllFilters=false,不执行所有的filter,一个filter报错,就返回,
if !f.runAllFilters {
// Exit early if we don't need to run all filters.
return statuses
}
}
}
return statuses
}
如果想调整插件顺序,可以使用如下方式修改配置文件
禁用所有的默认filter插件,只使用两个Filter插件
apiVersion kubescheduler.config.k8s.io/v1beta1
kind KubeSchedulerConfiguration
clientConnection
kubeconfig /Users/dz0400819/.kube/config
leaderElection
leaderElecttrue
resourceName sample-scheduler
resourceNamespace kube-system
leaseDuration 4s
renewDeadline 3s
profiles
schedulerName sample-scheduler
plugins
filter
disabled
name'*'
enabled
name SamplePlugin
name NodeResourcesFit
score
enabled
name SamplePlugin
preBind
enabled
name SamplePlugin
debug源码验证
可以看到FilterPlugin只有两个了,而且顺序和我们配置的顺序相同。
其他的plugin不受影响,查看下ScorePlugin
4.生产部署
kind ClusterRole
apiVersion rbac.authorization.k8s.io/v1
metadata
name sample-scheduler-clusterrole
rules
apiGroups
""
resources
endpoints
events
verbs
create
get
update
apiGroups
""
resources
nodes
verbs
get
list
watch
apiGroups
""
resources
pods
verbs
delete
get
list
watch
update
apiGroups
""
resources
bindings
pods/binding
verbs
create
apiGroups
""
resources
pods/status
verbs
patch
update
apiGroups
""
resources
replicationcontrollers
services
verbs
get
list
watch
apiGroups
apps
extensions
resources
replicasets
verbs
get
list
watch
apiGroups
apps
resources
statefulsets
verbs
get
list
watch
apiGroups
policy
resources
poddisruptionbudgets
verbs
get
list
watch
apiGroups
""
resources
persistentvolumeclaims
persistentvolumes
verbs
get
list
watch
apiGroups
""
resources
configmaps
verbs
get
list
watch
apiGroups
"storage.k8s.io"
resources
storageclasses
csinodes
verbs
get
list
watch
apiGroups
"coordination.k8s.io"
resources
leases
verbs
create
get
list
update
apiGroups
"events.k8s.io"
resources
events
verbs
create
patch
update
---
apiVersion v1
kind ServiceAccount
metadata
name sample-scheduler-sa
namespace kube-system
---
kind ClusterRoleBinding
apiVersion rbac.authorization.k8s.io/v1
metadata
name sample-scheduler-clusterrolebinding
namespace kube-system
roleRef
apiGroup rbac.authorization.k8s.io
kind ClusterRole
name sample-scheduler-clusterrole
subjects
kind ServiceAccount
name sample-scheduler-sa
namespace kube-system
---
apiVersion v1
kind ConfigMap
metadata
name scheduler-config
namespace kube-system
data
scheduler-config.yaml
apiVersion: kubescheduler.config.k8s.io/v1beta1
kind: KubeSchedulerConfiguration
profiles:
- schedulerName: sample-scheduler
plugins:
score:
enabled:
- name: "sample-plugin"
filter:
enabled:
- name: "sample-plugin"
preBind:
enabled:
- name: "sample-plugin"
---
apiVersion apps/v1
kind Deployment
metadata
name sample-scheduler
namespace kube-system
labels
component sample-scheduler
spec
replicas1
selector
matchLabels
component sample-scheduler
template
metadata
labels
component sample-scheduler
spec
serviceAccount sample-scheduler-sa
priorityClassName system-cluster-critical
volumes
name scheduler-config
configMap
name scheduler-config
containers
name scheduler-ctrl
image registry.cn-hangzhou.aliyuncs.com/vita_lili/lili sample-sheduler-v1
imagePullPolicy IfNotPresent
args
sample-scheduler-framework
--config=/etc/kubernetes/scheduler-config.yaml
--v=3
resources
requests
cpu"50m"
volumeMounts
name scheduler-config
mountPath /etc/kubernetes
5.参考文献
更多细节可以查看下面的文档。
这里主要讲的是k8s1.16版本的自定义scheduler,1.18版本和其理念相同,部分字段名,传参数不同!
https://www.qikqiak.com/post/custom-kube-scheduler/
官网中的scheduler扩展介绍
https://kubernetes.io/zh/docs/reference/scheduling/config/#scheduling-plugins
# 领导者选举介绍
https://zhengyinyong.com/post/kubernetes-pod-leader-election/
KubeSchedulerConfiguration
k8s.io/kube-scheduler/config/v1beta1/types.go
阿里推荐
https://developer.aliyun.com/article/756016