Catalog Details
CATEGORY
workloadsCREATED BY
UPDATED AT
November 23, 2024VERSION
0.0.10
What this pattern does:
node-problem-detector aims to make various node problems visible to the upstream layers in the cluster management stack. It is a daemon that runs on each node, detects node problems and reports them to apiserver. node-problem-detector can either run as a [DaemonSet](https://kubernetes.io/docs/concepts/workloads/controllers/daemonset/) or run standalone. Now it is running as a [Kubernetes Addon](https://github.com/kubernetes/kubernetes/tree/master/cluster/addons) enabled by default in the GKE cluster. It is also enabled by default in AKS as part of the [AKS Linux Extension](https://learn.microsoft.com/en-us/azure/aks/faq#what-is-the-purpose-of-the-aks-linux-extension-i-see-installed-on-my-linux-vmss-instances). There are tons of node problems that could possibly affect the pods running on the node, such as: Infrastructure daemon issues: ntp service down; Hardware issues: Bad CPU, memory or disk; Kernel issues: Kernel deadlock, corrupted file system; Container runtime issues: Unresponsive runtime daemon; ... Currently, these problems are invisible to the upstream layers in the cluster management stack, so Kubernetes will continue scheduling pods to the bad nodes. To solve this problem, we introduced this new daemon node-problem-detector to collect node problems from various daemons and make them visible to the upstream layers. Once upstream layers have visibility to those problems, we can discuss the remedy system.
Caveats and Consideration:
node-problem-detector uses Event and NodeCondition to report problems to apiserver. NodeCondition: Permanent problem that makes the node unavailable for pods should be reported as NodeCondition. Event: Temporary problem that has limited impact on pod but is informative should be reported as Event. For more Caveats And Considerations checkout this https://github.com/kubernetes/node-problem-detector
Compatibility:
Recent Discussions with "meshery" Tag
- Nov 22 | Meshery CI Maintainer: Sangram Rath
- Dec 04 | Link Meshery Integrations and Github workflow or local code
- Nov 20 | Meshery Development Meeting | Nov 20th 2024
- Nov 10 | Error in "make server" and "make ui-server"
- Nov 11 | Difference in dev Environments on port 9081 and 3000
- Nov 10 | npm run lint:fix error
- Oct 30 | Getting Meshery locally using Docker Desktop for Meshery UI contribution
- Nov 07 | Meshery + GCP Connector
- Oct 24 | Getting error when using utils.SetupContextEnv() when writing tests for relationship command
- Nov 16 | Where's the Cortex Integration of Meshmap?