Skip to content

Latest commit

 

History

History

2885-server-side-unknown-field-validation

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 

KEP-2885: Server Side Unknown Field Validation

Release Signoff Checklist

Items marked with (R) are required prior to targeting to a milestone / release.

  • (R) Enhancement issue in release milestone, which links to KEP dir in kubernetes/enhancements (not the initial KEP PR)
  • (R) KEP approvers have approved the KEP status as implementable
  • (R) Design details are appropriately documented
  • (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
    • e2e Tests for all Beta API Operations (endpoints)
    • (R) Ensure GA e2e tests for meet requirements for Conformance Tests
    • (R) Minimum Two Week Window for GA e2e tests to prove flake free
  • (R) Graduation criteria is in place
  • (R) Production readiness review completed
  • (R) Production readiness review approved
  • "Implementation History" section is up-to-date for milestone
  • User-facing documentation has been created in kubernetes/website, for publication to kubernetes.io
  • Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes

Summary

As a client sending a create, update, or patch request to the server, I want to be able to instruct the server to fail when the kubernetes object I send has fields that are not valid fields of the kubernetes resource.

This will allow us to remove client-side validation from kubectl while maintaining the same core functionality of erroring out on requests that contain unknown or invalid fields.

Motivation

kubectl –validate=true is the current mechanism to indicate that a request should fail if it specifies unknown fields on the object.

There are a few issues with this as highlighted by the previous effort to do server-side validation, primarily these issues include:

  • Bug Fixes are utilized slower because client-side upgrades are hard to get in people’s hands.
  • Each client needs to implement validation.

This is the last remaining step for removing client-side validation according to this comment

Additional problems have been highlighted in the relevant github issues:

Goals

  • Server should validate that no extra fields are present or invalid (e.g. misspelled), nor are any fields duplicated (for json and yaml data).
  • We must maintain compatibility with all existing clients, thus server side unknown field validation should be opt-in.
  • kubectl should default to server-side validation against servers that support it.
  • kubectl should provide the ability for a user to request no validation, or warning only validation, instead of strict server-side validation on a per-request basis.

Non-Goals

  • Complete (business-logic) server-side validation of every aspect of an object (i.e. we’re only focused on mismatched fields between the object in the request body and its schema).
  • Protobuf support. Theoretically, unknown protobuf fields could occur if clients on version X send a request to the server of version less than X which does not recognize some of the fields (or a similar situation). We do not think it is worth supporting this use case initially and will error if clients attempt to validate schema server-side with protobuf data.
  • Enhanced offline validation. Current client-side validation has been used as a form of offline validation to simply validate configs that have no intention of being actually applied to a server. Ideally users would have a supported way to perform offline validation, but that is outside the scope of this KEP.

Proposal

We propose adding server-side schema validation, enabling the API server to validate that objects received in the request body in POST, PUT, and PATCH requests contain no unknown or duplicate fields.

We introduce a query parameter that enables the user/client to indicate that server-side validation should either not be performed at all, provide warnings for the presence of unknown/duplicate fields, or error out in the presence of unknown/duplicate fields.

When server-side validation is enabled on the server (as will be the default starting in beta), we propose modifying the kubectl --validate flag to default to server-side strict validation and allow the user to choose between strict, warn-only, or no validation.

We propose documenting that client-side validation is deprecated and server-side validation will be used instead by default if available. We will fallback to client-side validation in the mean time if kubectl does not connect to a server with server-side validation enabled (either because the server it is connected to is older or has the feature-gate disabled, or because kubectl is not connected to any server at all).

When we fallback to client-side validation we will send a warning to let users know that their request is being validated by the deprecated client-side validation.

Notes/Constraints/Caveats (Optional)

Future Work

Out-of-Tree Alternatives to Client Side Validation

Upon successfully providing kubectl access to server-side validation via the --validate flag, an open question will remain as to the future of client side validation.

Kubectl will use server-side validation by default starting in beta for servers that have it enabled. If kubectl does not recognize a server with server-side validation enabled, however, it will fall back to using client-side validation with a deprecation warning.

Even though we feel deprecating client-side validation is justified due to the inevitability of it always being less correct than server-side validation, we still acknowledge that there are benefits to giving users a way to validate their objects without needing access to a server (which is a current use-case of client-side validation, albeit one that is error-prone and not officially supported).

Long-term, we want to favor using out-of-tree solutions for client-side validation. The kubeconform project is an example of an out-of-tree solution that does this.

SIG API Machinery's effort will go to producing clear and complete (and improving over time) API spec documents insteading of making client side validation. We strongly recommend that third part integrators make use of such documents.

Aligning json and yaml errors

A few discrepancies between sigs.k8s.io/yaml and sigs.k8s.io/json make detecting and reporting strict errors inconsistent and should be addressed at some point.

  1. sigs.k8s.io/yaml does not currently have a strict unmarshaling mechanism to distinguish between strict and non-strict errors while sigs.k8s.io/json does. This results in having to perform unmarshaling twice for yaml data in some cases. See this discussion from the the alpha PR and the follow-up tracking issue.

  2. kyaml and kjson currently report strict decoding errors in different formats. For example a duplicate field error appears from kyaml as line 26: key "imagePullPolicy" already set in map, while kjson reports the error as duplicate field "spec.template.spec.containers[0].imagePullPolicy". For consistencies sake, these two libraries should eventually report the same errors in the same format.

Risks and Mitigations

Performance Considerations

It is worth noting that checking for unknown fields whether via a json unmarshal that explicitly breaks if it encounters unknown fields (as in the case of create, update, or json patch) or extra logic around a merge step (as in strategic merge patch or apply) will be less performant than not doing so. Initial benchmarks estimate ~20% slower 25-30% more memory consumption.

This is deemed acceptable and insignificant because the use case we are replacing is client side validation that happens from kubectl (i.e. via a human that should not be too impacted by a slight performance hit). Any existing automated clients that would be impacted by performance do not currently have a way of leveraging server side validation and thus will remain opted-out of this by default (or will be willing to tradeoff the performance if they do choose to opt-in to server side schema validation).

Design Details

Opt-in API Mechanism (Query Parameter)

There are a few ways we could allow a client to opt-in to server-side unknown field validation, we believe the best option is to introduce a new query parameter to indicate the level of field validation the server should perform, such as ?fieldValidation=Strict.

Primary arguments for using query parameter include:

  • Parameters are discoverable in openapi.
  • We have precedent for using query parameters for write requests already (via CreateOptions, PatchOptions, UpdateOptions)

An alternative to using a query parameter, using content-type header is discussed in the #alternatives section.

For client-go support, we will add a FieldValidation field to CreateOptions, PatchOptions, and UpdateOptions that can supply the query parameter to the request.

In addition to supporting the options Strict (for erroring on unknown fields) and Ignore, we will also support the ?fieldValidation=Warn option, to warn, rather than error on unknown fields.

We believe that using a query parameter is the best approach.

Create (POST) and Update (PUT)

Implementation of this validation for Patch requests differ significantly and are discussed in the Patch section below.

At a high level, for create and update requests we have the opportunity to validate the object when we unmarshal the object from wire format into the go type.

This happens in the serializer implementation, where depending on if a “strict” option is set (or not), the unmarshalling step will error (or not) if extra/malformatted fields exist on the data being unmarshalled.

This in turn, gets used by create and update handlers when they attempt to decode the request body.

One major advantage of the “Content-Type” header approach is that it requires little to no change existing code in the request handlers.

When the API server is started and we create the negotiated serializer, we create a separate serializer for each content type/media type. To support strict validation all we would need to do is add another serializer to the NegotiatedSerializer for strict validation that is identical to the serializer of “application/json” but has the “strict” option set (as well as ones for yaml).

The request handler will then proceed unchanged, and the Decode step will fail if it encounters fields in the request body that are not part of the schema.

Patch (PATCH)

Unlike create and update requests, patch requests are more complicated because they do not ever unmarshal/decode the request body into a kubernetes object. Instead patching involves, serializing the existing object to json, then using a jsonpatch or mergepatch library to combine the patch sent from the client with the existing object, and finally deserializing the combined object into the kubernetes object.

During the jsonpatch or mergepatch step we don’t currently check if the patch sent by the client has any erroneous fields. Now we need to.

This all happens in the patchMechanism used by the patch handler.

There are three types of patchers used by the patch handler: jsonPatcher, smpPatcher, and applyPatcher that are each discussed individually.

JSON Patch

JSON patching is most commonly called when client-side apply looks to patch an existing custom resource.

For the jsonPatch implementation of applyPatchToCurrentObject, This is similar to create and update where after using the jsonpatch library to generate the patchedObjJS we then call DecodeInto.

Based on the presence of the validate query param, we should be able to use a codec that is or is not doing strict decoding that will fail, in the strict case, when invalid fields are present on the patched json blob.

Currently, like create and update, the handler generates the serializer from the scope and the codec from the serializer. We just need to ensure that it conditionally uses a StrictSerializer based on the presence of the validate query param.

Strategic Merge Patch

Strategic merge patching is most commonly called when client-side apply attempts to patch an existing object of a builtin type.

For the strategic merge patch (SMP) implementation, it calls into the apimachinery strategicpatch library to update the fields of the original object with the data from the patch. This goes through a few layers but it eventually boils down to this mergeMap call that does the actual merge of the patch into the original.

mergeMap goest field-by-field through the existing object, checking to see if data should be merged in from the patch. Currently, if there are any extra/unknown fields in patch, they will never be encountered and will just be ignored.

mergeMap takes a mergeOptions argument that we can update to have some configuration for strict validation. Within mergeMap, we can track the fields of the patch as they are meged in the existing object. After all fields that can be merged from the patch are, we will know that any fields on the patch that were not merged are unknown fields and can error out.

Apply Patch

Apply Patching is called whenever we send a server side apply request to the server.

For apply patch, server side schema validation already occurs and is not optional, no new behavior is needed here. The exception to this is that one can create an arbitrary schema that is still a valid structural schema (see docs on structural schema). If one manages to do this and permits unknown fields, then server-side schema validation will not be responsible for invalidating these.

For further context, the applyPatch path of the patch handler calls into the fieldmanager’s Apply method in order to generate the newly patched object that will be stored to etcd

This in turn, calls apply on it’s own internal fieldmanager (which is what actually implements the joining of the live object to the patch object to produce the new object). It attempts to convert the patch object into a TypedValue. This checks the object against its schema and will error if the object has any unknown or duplicate fields.

Kubectl Flag

For beta, we will modify the kubectl --validate flag from being a bool flag to a string flag that accepts the following values:

  • true or strict: If server-side validation is enabled on the server it sends a request with the fieldValidation param set to Strict, otherwise it falls back to client-side validation. It will also fall back to client-side validation if the request has --dry-run=client, because the request is not actually being sent to a server-side validation enabled server. The default remains true, as it currently is today.
  • false or ignore: This performs no server-side validation or client-side validation, sending a request to the server with fieldValidation param set to Ignore if server-side validation is enabled.
  • warn: This sends a request to the server with the fieldValidation param set to Warn. It performs server-side validation, but validation errors are exposed as warnings in the result header rather than failing the request.

We will introduce a mechanism similar to the cli-runtime DryRunVerifier to read the server's published OpenAPI to determine whether or not server-side validation is enabled on the server.

The goal here is for the flag to be intent based, whether we use client-side or server-side under the hood is based solely on whether or not server-side validation is supported by the apiserver kubectl is connected to.

Test Plan

[X] I/we understand the owners of the involved components may require updates to existing tests to make this code solid enough prior to committing the changes necessary to implement this enhancement.

Prerequisite testing updates

N/A

Unit tests

[alpha and beta] Logic that is changed in the apiextensions schema package (i.e. objectmeta algorithm and pruning algorithm) will be thoroughly unit tested as well as changes made to the apimachinery runtime converter and json decoding.

Additional testing has also been added to the apiserver/endpoints/handlers/rest_test.go to detect unknown and duplicate fields.

Integration tests

[alpha and beta] Primarily, server side validation will be integration tested and benchmarked via a complete test suite at test/integration/apiserver/field_validation_test.go

It tests the cross product of all valid permutations along the dimensions of:

  • request type (Create vs Update vs Patch)
  • data format (json, yaml, json patch, json merge patch, SMP patch, apply (create), apply (update))
  • schema type (typed/builtin, CRD/untyped with schema, CRD/untyped without schema)
e2e tests

[beta] With field validation on by default in beta, we will modify test/e2e/kubectl/kubectl.go to ensure that kubectl defaults to using server side field validation and detects unknown/duplicate fields as expected.

[GA] We will introduce field validation specific e2e/conformance tests to submit requests directly against the API server for both built-in and custom resources to test that duplicate and unknown fields are appropriately detected.

Graduation Criteria

Alpha

  • Feature implemented behind a feature gate
  • Integration tests added for all relevant verbs (POST, PUT, and PATCH)

Beta

  • kubectl validate flag offers ability to perform server-side validation
  • endpoints handler unit testing of field validation
  • customresource handler unit testing of field validation
  • field validation integration tests check for exact match of strict errors
  • In tree NestedObjectDecoders no longer short circuit on strict decoding errors #107545
  • Unknown/Duplicate fields are properly detected in the metadata at both the root level and within embedded objects #109215,#109316, and #109494

GA

  • Wait two releases (1.25 and 1.26) with field validation enabled by default and receive minimal bug reports pertaining to the feature. As a data point, GKE clusters running field validation by default in 1.25 are seeing less CPU and memory overall vs 1.24, supporting the notion that this feature has not significantly impacted performance.
  • [] Add conformance testing to invoke field validation directly against the API server

Version Skew Strategy

Following the graduation of server-side field validation to GA, there will be a period of time when a newer client (which expects server-side field validation locked in GA) will send requests to an older server that could have server-side field validation disabled.

Until version skew makes it incompatible for a client to hit a server with field validation disabled, kubectl must continue to check if the feature is enabled server-side. As long as the feature can be disabled server-side, kubectl must continue to have client-side validation. When the feature can't be disabled server-side, kubectl should remove client-side validation entirely.

Given that kubectl supports one minor version skew (older or newer) of kube-apipserver, this would mean that removing client-side validation in 1.28 would be 1-version-skew-compatible with GA in 1.27 (when server-side validation can no longer be disabled).

See section on Out-of-Tree Alternatives to Client Side Validation for alternatives to client-side validation

Production Readiness Review Questionnaire

Feature Enablement and Rollback

How can this feature be enabled / disabled in a live cluster?
  • Feature gate (also fill in values in kep.yaml)
    • Feature gate name: ServerSideFieldValidation
    • Components depending on the feature gate: kube-apiserver
    • Note: feature gate locked to true upon graduation to GA (1.27) and removed two releases later (1.29)
  • Other
    • Describe the mechanism: query parameter
    • Will enabling / disabling the feature require downtime of the control plane? NO
    • Will enabling / disabling the feature require downtime or reprovisioning of a node? NO
Does enabling the feature change any default behavior?

No, strict validation is false by default.

Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?

Yes. From the cluster operator's side, they can restart the kube-apiserver without the ServerSideFieldValidation feature gate set and this will disable the feature cluster-wide.

For end-users that no longer wish to perform server-side strict validation, simply not setting the query param will ensure the server is not doing strict validation.

What happens if we reenable the feature if it was previously rolled back?

No harm, requests will resume performing strict validation. The content of stored data does not change when the feature is used compared to when it is off, only new requests to the api-server will trigger strict validation.

Are there any tests for feature enablement/disablement?

Testing for both presence and absence of query param when the feature is enabled. We also test that modified codepaths that could be triggered when the feature is enabled are inert when the feature is disabled.

For example, unknown field path tracking in both the unstructured converter and structural pruning algorithm affect performance when server-side field validation is performed, but are tested to ensure that they are never invoked when server side field validation is disabled.

Rollout, Upgrade and Rollback Planning

How can a rollout or rollback fail? Can it impact already running workloads?

API servers rolled out with field validation enabled could start erroneously failing requests if there are problems with how field validation is implemented.

Also, performance issues could slow down API server performance or memory usage if extraneous validation is being invoked.

What specific metrics should inform a rollback?

A drastic deterioration in API server performance could indicate an issue with the feature and a need to rollback (i.e. apiserver_request_duration_seconds).

Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?

No

Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?

No deprecations or removals however the kubectl flag --validate=true is being changed to perform server-side validation instead of client-side validation (for servers that support it).

Monitoring Requirements

How can an operator determine if the feature is in use by workloads?

One could create a metric to track the number of validation errors on the server. We will not be implementing this for beta as it seems like overkill.

How can someone using this feature know that it is working for their instance?

The easiest way to know that the feature is enabled and working is to query the API server with the query param set and passing an object with deliberately invalid, unknown fields. An error from the server will indicate that the feature is enabled as expected.

Alternatively, and end-user can cross-reference the results of client-side validation (i.e. kubectl --validate=true) with the results of server-side. If client-side validation indicates an error but server-side does not, then the feature is disabled.

What are the reasonable SLOs (Service Level Objectives) for the enhancement?

We have benchmark tests for field validation here and an analysis of those benchmarks summarized here.

Based on the above, users should should expect to see negligible latency increases (with the exception of SMP requests that are ~5% slower), and memory that is negligible for JSON (and no change for protobuf), but around 10-20% more memory usage for YAML data.

Cluster operators can also use cpu usage monitoring to determine whether increased usage of server-side schema validation dramatically increases CPU usage after feature enablement.

What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?

Pick one more of these and delete the rest.

  • Metrics
    • Metric name: apiserver_request_duration_seconds
    • Components exposing the metric: kube-apiserver
Are there any missing metrics that would be useful to have to improve observability of this feature?

As mentioned above, one could implement a metric to track the quantity of field validation errors reported by the apiserver. Unnecessary because it provides insight about how clients are misusing the API (by sending invalid fields), more so than it provides any useful insight on the feature itself.

Dependencies

Does this feature depend on any specific services running in the cluster?

No

Scalability

Will enabling / using this feature result in any new API calls?

No

Will enabling / using this feature result in introducing new API types?

No

Will enabling / using this feature result in any new calls to the cloud provider?

No

Will enabling / using this feature result in increasing size or count of the existing API objects?

No

Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?

Mutating API calls that opt-in to validation will be slower. Benchmarks of alpha changes indicate that performing validation results in ~2-5% slower request latency and ~4-8% more memory usage for json and yaml requests.

Given that the majority of high-frequency requests made by system components use protobuf, we expect a negligible increase in overall resource usage.

Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?

Non-negligible increase to RAM of kube-apiserver when server-side validation is heavily utilized.

Troubleshooting

How does this feature react if the API server and/or etcd is unavailable?

Same as without the feature

What are other known failure modes?

N/A

What steps should be taken if SLOs are not being met to determine the problem?

Operators can disable server-side validation by setting the feature-gate to false if the performance impact of it is not tolerable.

Implementation History

  • 2021-11-19 Alpha Implementation PR of Server Side Field Validation merged.
  • 2021-12-07 Kubernetes 1.23 released with alpha implementation present.
  • 2021-12-08 Better documentation for fieldValidation parameter merged.

Alternatives

HTTP header mechanism

Instead of opting-in to server-side validation via a query parameter, we considered passing a specific content-type header that would indiciate strict schema validation.

This could be done by passing a mime-type parameter indicating strict validation such as application/json;validation=strict or we could use an entirely new header such as X-Kubernetes-Validation:Strict.

One could argue that this is the more appropriate way to parameterize opting-in to server side schema validation because on the server we are fundamentally treating the content as a different type (“application/json” is json data that we don’t care if it has extra fields, “application/json;validation=strict” is json data that is sensitive to extra fields).

A precedent for using a mime-type parameter is from how we receive resources as tables.

On the other hand, one could also argue that interpreting input strictly or not according to the target schema is independent of the content type and that this is an inappropriate way to parameterize validation opt-in.

Another argument against this method is that for patch, strictness is handled when decoding the result of the patch, so it wouldn’t make sense to use Content-Type which is providing information about the inbound request body.

Other Alternatives Considered

  • Passing multiple decoders around in the request scope
  • Change the Decode signature itself (or adding a new DecodeStrict that Decode calls into)
  • Instead of erroring on unknown fields, we could warn on unknown fields and then more generally have a TreatWarningsAsErrors option to fail on requests with unknown fields. One drawback here is that it would be difficult to flag unknown fields separately from other warnings that we might not want to error on (e.g. using a deprecated API). See note in #proposal section around adding an option for warning on unknown fields.