Skip to content

Sporadic "CT: Map insertion failed" errors, mostly on short-lived connections #35010

@lgyurci

Description

@lgyurci

Is there an existing issue for this?

  • I have searched the existing issues

Version

equal or higher than v1.16.0 and lower than v1.17.0

What happened?

This bug report is related to #33074 , where in the same context, "No mapping for NAT masquerade" errors occured. After updating from 1.15.x to 1.16.0, that issue got resolved, and instead, flows get dropped at about the same rate with the reason "CT: Map insertion failed". In #33115 , a specific race condition got "fixed", and my guess is, that if the port allocation now fails on rev snat, cilium does not try again, and drops the flow.

How can we reproduce the issue?

Try hosting a highly loaded DNS server on a kubernetes cluster with cilium as the CNI. Or create and close a lot of connections quickly in other ways (while the CT table is highly loaded), since the issue happens "randomly" on new connections. More ideas in #33074 .

Cilium Version

Client: 1.16.0 8299999 2024-07-23T22:22:14-07:00 go version go1.22.5 linux/amd64
Daemon: 1.16.0 8299999 2024-07-23T22:22:14-07:00 go version go1.22.5 linux/amd64

Kernel Version

Linux 5.14.0-427.16.1.el9_4.x86_64 #1 SMP PREEMPT_DYNAMIC Wed May 8 17:48:14 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

Kubernetes Version

v1.28.13+rke2r1

Regression

It did work differently in 1.15.x, but to my best knowledge, never worked properly.

Sysdump

No response

Relevant log output

No response

Anything else?

No response

Cilium Users Document

  • Are you a user of Cilium? Please add yourself to the Users doc

Code of Conduct

  • I agree to follow this project's Code of Conduct

Metadata

Metadata

Assignees

Labels

area/datapathImpacts bpf/ or low-level forwarding details, including map management and monitor messages.feature/conntrackkind/bugThis is a bug in the Cilium logic.kind/community-reportThis was reported by a user in the Cilium community, eg via Slack.needs/triageThis issue requires triaging to establish severity and next steps.sig/scalabilityImpacts how well Cilium handles a high rate of events or churn.

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions