This idea is a continuation of ideas 1) MASM-I-1177 2) MASCOM-I-88 3) MASCOM-I-89 4) MASCOM-I-90 5) MASCOM-I-91 6) MASCOM-I-92
To improve the availability, reliability, security, and scalable operations of Ford’s OpenShift clusters and the applications that share them, Ford Motor Company is enforcing K8s policies and best practices from Red Hat, Google, the Cloud Native Computing Foundation (CNCF), and the National Security Agency (NSA)/Infrastructure Security Agency (CISA) for hardening applications for Kubernetes platforms.
Maximo Application Suite -Manage installation/upgrades and deployments should abide to the following Kyverno policiies.
1. Require Image Checksum: Ensures immutability and integrity. Tags are mutable; digests (sha256:...) are not. This prevents deploying a different image than intended, even if the tag is maliciously or accidentally moved.
2. Require Multiple Replicas: For high availability of Deployments/StatefulSets. One option to properly implement singleton pattern correctly using “Leases” and/or “leader election” here is simple demo that show cases how easy it is implement these suggestions in code. An operator or controller of resources may be a singleton if all it is doing is the management of a resource that doesn’t affect the runtime of the application or the resource it is managing, and if it were to never recover would not cause any discernable change in behavior to the application. For example, for Crunchy and Quay, the operator manages the configurations of the application, and nothing else, if the operator were to go down, no updates could be made, but the application (Quay, PostgreSQL databases) would continue to run in spite of the operator being down.
However a negative example would be is dragonfly, that operator in addition to managing the configs, also manages the primary and replica states of the application, so if it were to go down, the application would not fail over in the event of node failure or general maintenance.
3. Require Pod Disruption Budgets (PDBs): Maintains app availability during voluntary disruptions (e.g., cluster upgrades, node maintenance). PDBs tell Kubernetes the minimum number of pods that must remain available.
4. Require Pod Probes: Ensures Kubernetes to understand application health (liveness: restart if broken; readiness: don't send traffic until ready; startup: accommodate slow-starting apps), note that use startup probe is encouraged when needed, but it is not enforced.
5. Require Limits & Requests: Essential for Kubernetes scheduler to place pods appropriately and for resource management/quota. Prevents "noisy neighbor" problems.
6. Require Reasonable PDBs: Prevents PDBs from being too restrictive and blocking maintenance (e.g., minAvailable: 100% or maxUnavailable: 0 for a multi-replica app). This policy ensures PDBs actually allow for some disruption
7. Require Read-Only Root Filesystem: Significantly enhances security by preventing attackers (or bugs) from modifying the container's filesystem, reducing the attack surface. Applications should write to dedicated volumes if they need to persist data.
8. Require StorageClass: Ensures PVCs explicitly request a specific type of storage, rather than relying on a default that might not be suitable. Provides control and predictability.
9. Require Topology Spread Constraints: For better pod distribution across failure domains (nodes, zones) to minimize the impact of localized outages.
10. Restrict Image Registries: Only allows images from a pre-approved list of registries for security.
11. Restrict Tolerations on Critical Nodes: Protects the stability and security of control plane and essential infrastructure nodes by preventing general workloads from being scheduled on them.
12. Restrict Node Selection: Prohibits nodeSelector and nodeName to control scheduling centrally. Prevents users from bypassing the scheduler's logic or pinning workloads to specific nodes, which can lead to resource imbalances or security issues if users target sensitive nodes. Encourages reliance on more abstract scheduling mechanisms like taints/tolerations (when appropriate and controlled) or affinity/anti-affinity or Topology Spread Constraints.
13. Restrict Pod Template Hash: Prevents manual modification pod-template-hash label is managed by Deployments/StatefulSets to track versions of ReplicaSets/pods. Manually setting or altering it can break the update and rollback mechanisms of these controllers.
14. Restrict External IPs: Manually setting externalIPs on a Service can be a security risk (e.g., CVE-2020-8554, allowing potential MITM or hijacking of traffic to an IP an attacker controls). It's better to let the cloud provider assign IPs via LoadBalancer services or use Ingress controllers.
15. Restrict sysctls: sysctls allow tuning kernel parameters. Unsafe sysctls can compromise node security or stability. Kubernetes has a list of "safe" namespaced sysctls; this policy enforces that only those are used.
16. Restrict Wildcard in Verbs (RBAC): Critical for the Principle of Least Privilege. Wildcards (*) in RBAC rule verbs grant excessive permissions (e.g., allowing all actions on pods). Specific verbs should always be used.
17. Validate HPA minReplicas: Ensures HPAs don't scale down below a minimum safe replica count (e.g., >=2).
18. Restrict OpenShift default app subdomain: While the default subdomain (*.apps.<cluster_name>.<base_domain>) provides a simple starting point, its limitations in production environments and its potential for causing conflicts we require teams to implement custom domain strategies
The document attached explains in detail about the poliicies that are applied in our OCP infrastructure.