K8s社区主要通过Extender Resource和Device Plugin方式给为用户提供GPU物理资源支持。 每个GPU厂商都会实现自己的Device Plugin Agent,Agent在底层节点层会将物理卡扫描上报到集群。用户用拓展资源方式在Pod创建时指定需要物理卡的数量,Device Plugin在每个Kubelet节点上做启动,并且调用各个GPU厂商的设备工具,将设备卡资源扫描上报。
In this article, you will learn why Kubernetes uses etcd as a database by building (and breaking) a 3-node etcd cluster.
In this article, you will learn how to recreate the Kubernetes RBAC authorization model from scratch and practice the relationships between Roles, ClusterRoles, ServiceAccounts, RoleBindings and ClusterRoleBindings.
In this article, you'll look at Kafka's architecture and how it supports high availability with replicated partitions. Then, you will design a Kafka cluster to achieve high availability using standard Kubernetes resources and see how it tolerates node maintenance and total node failure.
This checklist provides actionable best practices for deploying secure, scalable, and resilient services on Kubernetes.
Kubernetes doesn't load balance long-lived connections, and some Pods might receive more requests than others. If you're using HTTP/2, gRPC, RSockets, AMQP or any other long-lived connection such as a database connection, you might want to consider client-side load balancing.
In Kubernetes resource constraints are used to schedule the Pod in the right node, and it also affects which Pod is killed or starved at times of high load. In this blog, you will explore setting resource limits for a Flask web service automatically using the Vertical Pod Autoscaler and the metrics server.
here's a diagram to help you debug your deployments in Kubernetes
In this article, you will learn how packets flow inside and outside a Kubernetes cluster. Starting from the initial web request and down to the container hosting the application.