Skip to content

CI: 100 Nodes Scale Test (scale-100) #41313

@marseel

Description

@marseel

CI failure

Since updating kops to 1.33: cilium/scale-tests-action#39
followed by bump in cilium/cilium: ca403fe
100 node scale test started failing on provisioning cluster:

Validation Failed
I0821 01:26:20.188259    7194 gce_cloud.go:307] Scanning zones: [us-west1-b us-west1-c us-west1-a]
INSTANCE GROUPS
NAME				ROLE		MACHINETYPE	MIN	MAX	SUBNETS
control-plane-us-west1-a	ControlPlane	n2-standard-8	1	1	us-west1
nodes-us-west1-a		Node		n2-standard-8	1	1	us-west1

NODE STATUS
NAME				ROLE		READY
control-plane-us-west1-a-x6x7	control-plane	True
nodes-us-west1-a-2qpk		node		True
W0821 01:26:21.454372    7194 validate_cluster.go:242] (will retry): cluster not yet healthy

VALIDATION ERRORS
KIND	NAME					MESSAGE
Pod	kube-system/coredns-545b87966c-hlkxb	system-cluster-critical pod "coredns-545b87966c-hlkxb" is pending

Example run: https://github.com/cilium/cilium/actions/runs/17114139466/job/48541643221
Comparing coredns manifests, it changed from:

    topologySpreadConstraints:
    - labelSelector:
        matchLabels:
          k8s-app: kube-dns
      maxSkew: 1
      topologyKey: topology.kubernetes.io/zone
      whenUnsatisfiable: ScheduleAnyway
    - labelSelector:
        matchLabels:
          k8s-app: kube-dns
      maxSkew: 1
      topologyKey: kubernetes.io/hostname
      whenUnsatisfiable: ScheduleAnyway

to

    topologySpreadConstraints:
    - labelSelector:
        matchLabels:
          k8s-app: kube-dns
      maxSkew: 1
      topologyKey: topology.kubernetes.io/zone
      whenUnsatisfiable: DoNotSchedule
    - labelSelector:
        matchLabels:
          k8s-app: kube-dns
      maxSkew: 1
      topologyKey: kubernetes.io/hostname
      whenUnsatisfiable: DoNotSchedule

Related to change in kops: kubernetes/kops#17472

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/CIContinuous Integration testing issue or flakeci/flakeThis is a known failure that occurs in the tree. Please investigate me!sig/scalabilityImpacts how well Cilium handles a high rate of events or churn.staleThe stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions