- Release Signoff Checklist
- Summary
- Motivation
- Proposal
- Design Details
- Production Readiness Review Questionnaire
- Implementation History
- Drawbacks
- Alternatives
Items marked with (R) are required prior to targeting to a milestone / release.
- (R) Enhancement issue in release milestone, which links to KEP dir in kubernetes/enhancements (not the initial KEP PR)
- (R) KEP approvers have approved the KEP status as
implementable
- (R) Design details are appropriately documented
- (R) Test plan is in place, giving consideration to SIG Architecture and
SIG Testing input (including test refactors)
- e2e Tests for all Beta API Operations (endpoints)
- (R) Ensure GA e2e tests for meet requirements for Conformance Tests
- (R) Minimum Two Week Window for GA e2e tests to prove flake free
- (R) Graduation criteria is in place
- (R) all GA Endpoints must be hit by Conformance Tests
- (R) Production readiness review completed
- (R) Production readiness review approved
- "Implementation History" section is up-to-date for milestone
- User-facing documentation has been created in kubernetes/website, for publication to kubernetes.io
- Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
This KEP proposes a new endpoint to publish the OpenAPI v3 specification for Kubernetes types. This solves our problem of stripping fields when publishing the OpenAPI v2 and improves the transparency and accuracy of the published OpenAPI.
Kubernetes resources and types can be described through their OpenAPI definitions, currently served as OpenAPI v2 at the $cluster/openapi/v2 cluster endpoint. With the introduction of CRDs, users provide an OpenAPI v3 spec to describe the resource. Since kubernetes only publishes OpenAPI v2, these definitions are converted into an OpenAPI v2 definition before being aggregated and served at the /openapi/v2 endpoint. OpenAPI v3 is more expressive than v2, and the conversion results in the loss of some information. Some of the type definitions are just transformed into “accept anything” definitions because of our lack of ability to perform a better conversion and limitations in kubectl. Some efforts at improving the OpenAPI (eg: enum support) are targeted to bo bundled with OpenAPI v3.
Additionally, since kubernetes aggregate the OpenAPI v2 spec for all specs into one endpoint, whenever a spec is modified the entire spec is updated and all clients will need to redownload the spec. Depending on the size and number of CRDs, this is not an inexpensive operation.
- Support publishing and aggregating OpenAPI v3 for all kubernetes types
- Published v3 spec should be a lossless representation of the data and no fields should be stripped for both built-in types and CRDs
- Instead of serving the entire OpenAPI spec at one endpoint, separate the spec by group-version and only serve the resources required by each group-version
There are a couple of important fields part of the CRD structural schema that
are stripped when we publish v2: oneOf
, anyOf
, nullable
, default
. We
would like to keep all fields and publish v3 specs without any information loss
for both built-in types and CRDs. Refer to
discussion
for more information around exposing defaults.
- Accommodating new OpenAPI v3 fields outside the Schema object that do not affect validity of the spec. These are nice to have fields that can be added later.
- Consumers of the OpenAPI (eg: Server Side Apply, client-go, kubectl, etc) will eventually need to be updated to consume v3. This is outside the scope of this KEP.
The proposal is to publish OpenAPI v3 schemas at the
/openapi/v3/apis/{group}/{version}
endpoint for all resources. Instead of
aggregating everything into one schema, we will publish separate schemas per
resource group-version. The aggregated schema will not be published at the
openapi/v3 endpoint and the endpoint will instead be used for discovery
purposes. For users who still need to retrieve the entire OpenAPI spec (not
recommended), we will offer a client side utility to aggregate all the
endpoints.
After OpenAPI v3 is implemented, we can start working on updating clients to consume OpenAPI v3. A separate KEP will be published for that.
As OpenAPI V3 becomes more and more available and easily accessible, clients should move to OpenAPI V3 in favor of OpenAPI V2. This should result in a steady decline in the number of requests sent to the OpenAPI V2 endpoint. In the short term, we will make OpenAPI V2 make computations more lazily, deferring the aggregation and marshaling process to be on demand rather than at a set interval. This is not directly part of OpenAPI V3's GA plan, but is something we will be monitoring closely once OpenAPI V3 is GA.
Aggregation for OpenAPI v2 already consumes a non-negligible amount of resources, naively adding in an aggregation for v3 will double the workload when the OpenAPI needs to be re-aggregated. Splitting by group-version partially mitigates this problem because when a resource is changed, only a small subset of groups will need to be reaggregated compared to the current behavior of re-aggregating everything.
The root /openapi/v3
endpoint will contain the list of paths (groups)
available and serve as a discovery endpoint. Clients can then choose the
group(s) to fetch and send additional requests.
/openapi/v3
{
"Paths" : {
"api": "/openapi/v3/api?etag=tag",
"api/v1": "/openapi/v3/api/v1?etag=tag",
"apis": "/openapi/v3/apis?etag=tag",
"apis/admissionregistration.k8s.io/v1": "/openapi/v3/apis/admissionregistration.k8s.io/v1?etag=tag",
"apis/apiextensions.k8s.io/v1": "/openapi/v3/apis/apiextensions.k8s.io/v1?etag=tag",
"apis/apps/v1": "/openapi/v3/apis/apps/v1?etag=tag",
...
}
}
Based on the provided group, clients can then request openapi/v3/apis/apps/v1
,
/openapi/v3/apis/networking.k8s.io/v1
and etc. These leaf node specs are self
contained OpenAPI v3 specs and include all referenced types.
The discovery document has the format of a map with the key being the
group-version and value representing the URL of the OpenAPI for the
particular group version. Note that the URL can be constructed by the
client by prepending the group-version name with the openapi/v3
prefix. The URL listed here provides a special etag query parameter to
denote the latest etag for the OpenAPI spec for the particular
group-version. The concept of using and changing the query parameter
when a new version of the spec is available is a pattern used
frequently in browser caching known as cache busting. All OpenAPI spec
requests with the ?etag
query parameter will return a response with
Cache Control: immutable
. This allows clients to cache the OpenAPI
spec when the etag is not changed. The max-age
will also be set to a
large value that is equivalent to publishing a spec that never
expires.
Support for caching is built into the httpcache library used in client-go, and no change is needed on the client side to support this mechanism other than passing the additional query parameter. Passing the etag as the query parameter allows clients to check the etag in the root discovery document. Clients can avoid sending an additional request to fetch an ETag from specific group-versions, and the root document itself can provide information on group-version changes and updates. If there is a race and the client passes in an outdated etag value, the server will send a 301 to redirect the client to the URL with the latest etag value.
To ensure that the client cache does not grow in an unbounded manner, the client-go disk cache will be replaced with a cmopactable disk cache which performs occasional cleanup of old cached files. Refer to PR for more details.
The OpenAPI builder for CRDs converts the v3 validation schema into a v2 schema. We will keep the conversion to the v2 schema for backwards compatibility, but also generate the v3 schema to publish at the new v3 endpoint.
Kubernetes publishes the openapi schema in both JSON and Protobuf. v3 schemas will include the same mechanisms as v2, publishing a protobuf model of the schema, as well as the corresponding etags for caching. In order to publish the schema based on groups, the CRDs will be grouped by group and published only at the endpoint for their specific group.
The aggregator has a mapping of all the APIServices and refreshes the aggregated spec on an interval. APIService already publish by group-version so their behavior is unchanged. Because OpenAPI V3 is published by group-version, the fully aggregated spec is not needed and thus aggregation can be skipped. No spec downloading will be done by the aggregator. The aggregator in OpenAPI V3 will act as a proxy rather than aggregator, proxing group-version requests to downstream API servers.
Swagger 2.0 and OpenAPI 3.0 have a couple of incompatible changes that are not currently supported by kube-openapi. Definitions are moved to components and the schema for Paths is slightly modified. A new struct and builder will be created in kube-openapi to represent the v3 schema to be imported by the necessary consumers.
CRD structural schemas will be published as is without any value stripping for v3. Built-in types will generate the v3 OpenAPI definition, and either keep all fields or strip incompatible fields when building the swagger spec for v3 and v2 respectively.
The OpenAPI handler will have an additional map to handle the publishing of v3.
// Group -> OpenAPIv3 mapping
var v3Schemas map[string]*OpenAPIv3
type OpenAPIv3 struct {
rwMutex sync.RWMutex
lastModified time.Time
spec3Bytes []byte
spec3Pb []byte
spec3PbGz []byte
spec3BytesETag string
spec3PbETag string
spec3PbGzETag string
}
There is a potential version skew between old aggregated apiservers and a new kube-apiserver. All new aggregated apiservers will publish v3 which will be discovered and published by the aggregator for v3. Old aggregated apiservers will not publish v3, and the aggregator cannot discover the v3 schema for the corresponding aggregated apiservers. To make the v2 to v3 transition process smoother when a version skew exists, the aggregator will download the v2 schema from aggregated apiservers if they have not upgraded to v3. This will provide clients with an option to only use the v3 endpoint and they can immediately drop support for v2. The drawback is that v2 is lossy and converting it to v3 will provide a lossy v3 schema. This problem will be fixed when aggregated apiservers upgrade to publishing v3.
Kubernetes relies on a "bug" (relaxed constraint) in the gnostic library for OpenAPI v2 where a $ref
and description
can coexist in the same object. See Issue for more details. This is disallowed per JSON Schema Draft 4 which is the schema version OpenAPI v2 follows. The $ref
and description
coexistence is important to Kubernetes because kubectl explain uses it for providing documentation for reference fields.
For instance, a PodSpec object could have the properties:
"affinity": {
"$ref": "#/definitions/io.k8s.api.core.v1.Affinity",
"description": "If specified, the pod's scheduling constraints"
},
kubectl explain uses the description to provide documentation for struct fields that are represented as references in the OpenAPI schema. The description here describes a field/struct's role in the PodSpec object rather than the struct itself. The gnostic library for OpenAPI 3.0 currently disallows the relaxed constraint and removes the description field when the OpenAPI spec is passed through proto. We will work with the gnostic team to update the library to support the same constraint relaxation as in OpenAPI 2.0 to fix the OpenAPI 3.0 protobuf.
Another workaround to this problem could be to wrap reference structures with allOf
.
Eg:
"affinity": {
"allOf": [{
"$ref": "#/definitions/io.k8s.api.core.v1.Affinity",
}],
"description": "If specified, the pod's scheduling constraints"
},
This solves the immediate protobuf issue but adds complexity to the OpenAPI schema.
The final alternative is to upgrade to OpenAPI 3.1 where the new JSON Schema version it is based off of supports fields alongside a $ref
. However, OpenAPI does not follow semvar and 3.1 is a major upgrade over 3.0 and introduces various backwards incompatible changes. Furthermore, library support is currently lacking (gnostic) and doesn't fully support OpenAPI 3.1. One important backwards incompatible change is the removal of the nullable field and replacing it by changing the type field from a single string to an array of strings.
Updating all OpenAPI V2 clients to use OpenAPI V3 is outside the scope of this KEP and part of future work. However, some key clients will be updated to use OpenAPI V3 as part of GA. One example is OpenAPI V3 for kubectl explain. Server Side Apply will also be updated to directly use the OpenAPI V3 schema rather than OpenAPI V2. Finally, the query param verifier in kubectl will be updated to use OpenAPI V3.
[x] I/we understand the owners of the involved components may require updates to existing tests to make this code solid enough prior to committing the changes necessary to implement this encement.
This feature is primarily implemented in kube-openapi and unit tests, validation, benchmarks and fuzzing are added there. Some packages in k/k will be modified to capture the changes in kube-openapi and unit tests accompany them.
k8s.io/apiextensions-apiserver/pkg/controller/openapi/builder
k8s.io/kube-aggregator/pkg/controllers/openapiv3/aggregator
Tests in the following directory:
test/integration/apiserver/openapi/...
: https://storage.googleapis.com/k8s-triage/index.html?test=TestOpenAPIV3
Tests will be added to ensure that OpenAPI is present for all group versions.
- Feature implemented behind a feature flag
- Initial e2e tests completed and enabled
- Native types are updated to capture capabilities introduced with v3
- Incorrect OpenAPI polymorphic types (IntOrString, Quantity) are updated to use
anyOf
in OpenAPI V3
- Incorrect OpenAPI polymorphic types (IntOrString, Quantity) are updated to use
- Definition names of native resources are updated to omit their package paths
- Parameters are reused as components
- Aggregated API servers are queried for their v2 endpoint and converted to publish v3 if they do not directly publish v3
- Heuristics are used for the OpenAPI v2 to v3 conversion to maximize correctness of published spec
- Aggregation for OpenAPI v3 will serve as a proxy to downstream OpenAPI paths
- OpenAPI V3 uses the optimized JSON marshaler for increased performance. See link for the benefits with OpenAPI V2.
- Updated OpenAPI V3 Client Interface
- Direct conversion from OpenAPI V3 to SMD schema for Server Side Apply
- HTTP Disk Cache is replaced with Compactable Disk Cache
- Conformance tests
- Enabling/disabling the feature gate
- For upgrading, aggregated apiserver images must also be updated to serve OpenAPI v3 in order for the aggregator to pick up the spec and publish OpenAPI v3. OpenAPI v2 will continue to work regardless of version skew.
- For downgrading, aggregated apiservers do not need to be downgraded as everything will revert to using OpenAPI v2 and the OpenAPI v3 endpoint will be untouched.
There is a possible skew between kube-apiserver and old aggregated apiservers. The affected aggregated apiservers will not publish v3 and but the kube-apiserver will attempt to query the v2 endpoint and convert it to v3. This is a lossy conversion, but the v3 will be a complete representation of all the cluster resources.
- Feature gate (also fill in values in
kep.yaml
)- Feature gate name: OpenAPIV3
- Components depending on the feature gate: kube-apiserver
- Other
- Describe the mechanism:
- Will enabling / disabling the feature require downtime of the control plane?
- Will enabling / disabling the feature require downtime or reprovisioning of a node?
No. A new openapi/v3
endpoint is added but no existing behavior is changed.
All api resources will have both their openapi v2 and v3 published.
Yes, the feature may be disabled by reverting the feature flag.
The feature does not depend on state, and can be disabled/enabled at will.
n/a.
Version skew is discussed in a section above during a rolling control plane upgrade.
Non 200 responses from the openapi/v3
endpoint could indicate a problem. A long response time from the apiserver for OpenAPI requests could also be an indicator of a problem.
n/a
Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
No.
The OpenAPI path /openapi/v3
is populated. On the metrics side, an OpenAPI V3 specific metric is crd_openapi_v3_aggregation_duration_seconds
, and should emit data if the feature is enabled.
The openapi/v3
endpoint will be populated with the list of groups if OpenAPI V3 is enabled.
This feature should not affect the SLO of any components.
OpenAPI v3 aggregation comes on-top of v2 aggregation => there is additional load. But v3 aggregation is cheap:
- OpenAPI v3 aggregation is much cheaper as every group-version is aggregated independently.
- APIServices (aggregated apiservers) coming and going (due to availability changes) do not lead to aggregation because kube-apiserver only acts as a proxy
- CRD aggregation is structurally trivial (no unification of definition names, which is quadratic in schema sizes) and hence linear in the number of CRDs per group-version.
- CRDs do not "come and go" as aggregated APIs, but it's a one-time per CRD schema change and API server startup operation.
What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
A new metric will be added in CRD controller to measure the time taken to aggregate CRD OpenAPI specs.
- Metrics
- Metric name:
crd_openapi_v3_aggregation_duration_seconds
- Components exposing the metric: kube-apiserver
Are there any missing metrics that would be useful to have to improve observability of this feature?
Not at the moment.
OpenAPI V3 aggregates from apiservers provided by APIService.
- APIService
- OpenAPI V3 fetches the
openapi/v3
endpoint and specs from aggregated API- Impact of its outage on the feature: If an APIService is unavailable, the OpenAPI spec of the corresponding APIService will be unavailable but other OpenAPI specs will be unaffected. The resource usage will be better than with OpenAPI V2 because no aggregation is needed when APIServices become unavailable and available.
- Impact of its degraded performance or high-error rates on the feature: Same as above.
- OpenAPI V3 fetches the
Yes. Get on the /openapi/v3
endpoint as well as
/openapi/v3/{group}/{version}
for each API group provided by Kubernetes.
No.
No.
No.
Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?
No
Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?
No. One thing to note is that aggregation for OpenAPI v2 consumes quite a bit of memory. OpenAPI v3 will avoid aggregating the entire spec and only aggregate (if necessary) per group/version, decreasing the runtime and memory usage to a negligible amount.
Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)?
This feature strictly operates on the control plane and has no effect on node resources.
The feature is part of the API server and will not function if it is unavailable. It does not depend on the availability of etcd.
- Failure in endpoint
- Detection: How can it be detected via metrics? Stated another way: how can
an operator troubleshoot without logging into a master or worker node?
Lack of 200 status in the OpenAPI endpoint. High latency in apiserver responses
- Mitigations: What can be done to stop the bleeding, especially for already
running user workloads?
OpenAPI V3 can be rolled back and disabled via the feature flag
- Diagnostics: What are the useful log messages and their required logging
levels that could help debug the issue? Not required until feature
graduated to beta.
Warning and Error logging messages having openapi/swagger/spec keywords.
- Testing: Are there any tests for failure mode? If not, describe why.
Tests will be added for failure conditions.