Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pause-resume-frozen functionality #168

Merged
merged 23 commits into from
Dec 29, 2023
Merged

Pause-resume-frozen functionality #168

merged 23 commits into from
Dec 29, 2023

Conversation

Jorres
Copy link
Contributor

@Jorres Jorres commented Dec 10, 2023

I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

Pull request type

Please check the type of change your PR introduces:

  • Bugfix
  • Feature
  • Code style update (formatting, renaming)
  • Refactoring (no functional changes, no api changes)
  • Build related changes
  • Documentation content changes
  • Other (please describe):

What is the current behavior?

  • no pause\resume\frozen functionality

Issue Number: YDBOPS-8460

What is the new behavior?

  • Storage and Database CRDs are getting a new .spec.Pause field with possible values Paused | Running | Frozen.

  • When Database's .spec.pause is set, StatefulSet is deleted, but Database object is still persisted.

  • When Database's .spec.pause is unset (from Paused to Running), StatefulSet is recreated, tenant is not re-initialized because the state of the tenant is persistent on the drives

  • Storage behaves in the same way, except it is NOT possible to pause a Storage which has active (non-paused) databases (as I see it, this does not make much sense)

  • Storage and Database resources can only be deleted when the Status is Paused

  • When Storage and Database have .spec.Pause equal to Frozen, the reconcile loop of the operator for the particular object is effectively off. Operator will no longer create\modify\delete resources. Notice that does not apply to manual modifications to Pods, which are under control of StatefulSet controller.

  • Added e2e-tests to cover the above functionality

Other information

  • Webhook part has been tested manually only, I am still figuring out how to enable webhooks on e2e kind clusters and not die in the process
  • A new internal/controllers/constants.go package has been introduced because access to condition names and Storage\Database state names is necessary in many different places now (in webhooks, tests, in monitoring controllers, in the code from resource folder, and in storage`database` controllers themselves, phew).

@Jorres Jorres requested a review from artgromov December 10, 2023 00:48
@Jorres
Copy link
Contributor Author

Jorres commented Dec 10, 2023

@artgromov Among the checking the core content of the PR, please pay attention to how I implemented the change in Storage webhook. To check that the Storage has no active Databases, a simplest solution would be to just query k8s api server from a webhook.

But I feel it is an EXTREMELY BAD idea to query the k8s api server from a webhook (can explain in detail if necessary), that's why I keep the info about databases in the Storage.Status.ConnectedDatabases and update it from the Database controller.

I don't see any downsides in this approach (we spend additionally a bit of cpu cost and a roundtrip to api server on every database reconcile), but the webhook has much smaller chance of failing, and has one less roundtrip to api server.

@Jorres Jorres changed the title Pause-resume functionality Pause-resume-frozen functionality Dec 10, 2023
@Jorres Jorres requested a review from kobzonega December 11, 2023 09:08
Copy link
Contributor

@kobzonega kobzonega left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, this PR introduces unnecessary complexity to the controller's logic. Here are two point that can be improved:

  1. Changing replicas to 0, instead of deleting a StatefulSet object
    It will achieve the same result and avoiding destructive operations that could lead to data loss. For example persistentVolumeReclaimPolicy setting may be set by client to Delete, which is default when using dynamic provisioning.

  2. Checking dependent databases when deleting Storage using a List operation
    Using the .status field in the Storage object to store a list of Databases may overload the object with unnecessary information and affect cluster performance. Instead of this you can perform a List operation in webhook with using FieldSelector for the .spec.storageRef field using indexing as described here

// `Running` means the default state of the system, all Pods running.
// +kubebuilder:default:=Running
// +optional
Pause string `json:"pause,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think better to use two different parameters for frozen and pause features, because it's orthogonal to each other. Also I think usage of string variable is not simple and clear
Therefore I suppose to Pause parameter should implement logic to force unscale/delete StatefulSet and be bool variableand for example operatorSync parameter should implement logic to disable operator Reconcile

Copy link
Contributor Author

@Jorres Jorres Dec 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you underestimate the complexity that comes from introducing 2 orthogonal fields, and they are NOT completely orthogonal. If we do as you suggest, we will have 4 states: (non-frozen, non-paused), (frozen, non-paused), (non-frozen, paused) and (frozen, paused). I'm not sure if one of these states - (frozen, paused) - is useful. If we introduce these 2 arguments, we will have to process 6 different transitions (number of edges in the graph with 4 vertices), and this will bring a lot of complexity into the codebase. We may additionally discuss this later in voice

Copy link
Contributor

@kobzonega kobzonega Dec 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Frozen parameter does not correlate with state of Storage/Database and it's dependent objects. What do I mean by that? The main goal of this feature that we can manual edit StatefulSet/ConfigMap/Service and another dependent objects with sure that ydb-operator do not override our changes. This operation is needed for example during maintenance and migration actions. During this period, the YDB engineer assumes control of the Storage/Database (logical state of cluster without changes of .status fields) under his responsibility. We can not stop StatefulSet (or Service for example) k8s controller, but client can use some special k8s tricks like orphan=true parameter to prevent cascade deletion or kubectl replace to change spec with deletion of object. In that case we can only guarantee that 1) ydb-operator reconcile disabled and does not sync CR and dependent objects 2) no one parameter can be changed while Reconcile function is disabled and using webhook to verify that (we can edit CR resources only when state Frozen set to false)

In opposite Pause parameter correlate with state of Storage/Database and it's dependent objects. The main goal of this feature that we can do force stop processes without additional checks. This operation is needed ,for example, during ad-hoc situations and cluster monstrous problems. Also we should change object's state to Paused which indicate that it's desired state when no one process is running. Additional function of this field is prevent deletion of Storage/Database object (we can delete CR resources only when state Pause set to false + for Storage we should additional check existing of dependent Database objects). To prevent data loss and inconsistency state we should implement finalizer to Pods with mandatory CMS approve before killing process (please look at YDBOPS-6003).

State string `json:"state"`
Conditions []metav1.Condition `json:"conditions,omitempty"`
State constants.ClusterState `json:"state"`
ConnectedDatabases []ConnectedDatabase `json:"connectedDatabases,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

YDB as multitenant system should manage hundreds or thousands databases. Should we store information about all of this in single Storage object? It may unreadable for engineer whom describe this object. Also it would be additional load to k8s apiserver and etcd

Copy link
Contributor Author

@Jorres Jorres Dec 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I did not think of scaling there, you are right. I think I made a mistake when I implemented the push model - I mean, when the Database pushes the changes about itself to the Storage, every database downloads, reads and sends back the whole list of databases connected to its Storage.

This can be mitigated by moving to pull model, where the storage controller would list the databases in the cluster once per some timeout, say, 10s. Or I could investigate watch api and support some caching in the way that kopf does.

But I still think this knowledge is necessary for a Storage (to prevent pause or deletion when databases are active, please look at YDBOPS-8652), and it is not something that should be queried from a webhook. However, if I don't come up with a better solution, I might impement your List suggestion.

Copy link
Contributor

@kobzonega kobzonega Dec 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is not something that should be queried from a webhook

Can you explain why? I think webhook is the right place where this logic should be. Also we already using k8s client implementation inside monitoring admission webhook (ref code)

if r.Spec.Pause == PausedState {
for _, database := range storage.Status.ConnectedDatabases {
if database.State == DatabaseReady {
return fmt.Errorf("can not set Pause for Storage which has running Databases: %s", database.Name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Main goal of this feature it's ability to force and fast delete all Storage pods. There is no need here to check databases and enable additional dependencies.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need additional triage from @artgromov here. Should we really kick the Storage pods from under a working Database? Wouldn't it lead to a lot of undefined behaviours after the Storage comes back online? Maybe I just misunderstood the task, if we don't have to guarantee anything, sure, I can remove this check

storage *resources.StorageClusterBuilder,
) (bool, ctrl.Result, error) {
r.Log.Info("running step checkStorageFrozen for Storage")
if storage.Status.State != StorageFrozen && storage.Spec.Pause == FrozenState {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we should set a new type of state object when disabling Reconcile? This information relates to ydb-operator functions, not to the state of the YDB Storage. In my opinion, it is better only to use a special constant name to describe the paused condition field.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would argue it IS important for the user to see that his Storage\Database is Frozen. I thought about this state as just about a convenience feature for the user, and it's essentially free. Otherwise, the user might forget to un-set the Frozen parameter, then be surprised because operator has not reacted to any changes for the past N hours. :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer using check in webhook to verify that we editing CR resources only when state Frozen set to false. That also prevent changes and should be clear to see. I don't like idea of saving previous state of object in .status field and trying to revert to this state when unfroze process. I believe that Reconcile and Sync function should be completely execute with minimum conditional operators

@Jorres
Copy link
Contributor Author

Jorres commented Dec 12, 2023

We have decided:

  • to implement two fields instead of one, but make frozen not a state, rather a flag
  • forbid to change anything while the frozen flag is set
  • heavy validation in the webhook: it should be impossible to set frozen while setting something else (setting something else might require multple reconciliations, but frozen will kick in from the next iteration)
  • only the following transitions would be possible:
    • pending <-> frozen
    • initializing <-> frozen
    • ...
    • ready <-> frozen

and natural flow pending -> initializing -> ... -> ready.

Additionally, as for deletion protection, I will implement the List operation based on the indexer - should not require passing much info in the claster in the backgroud and should keep webhook blazingly fast

@Jorres
Copy link
Contributor Author

Jorres commented Dec 26, 2023

Hey! The suggestions have been implemented, the tests have been adjusted.

The only thing NOT implemented is the deletion protection logic (if Storage has active Databases, the deletion of this Storage is not allowed). I had some skill issues implementing that using a watcher. Since this is not the core issue of this PR, let me bring that in a small follow-up PR a bit later.

One more thing - since we still CAN NOT run tests with webhooks enabled (I honestly don't remember the reason, will have to look into it deeper), I have tested the webhook functionality manually. Since there is a lot of new logic about Frozen in the webhook, please look into it a bit more carefully. Thanks!!

@Jorres
Copy link
Contributor Author

Jorres commented Dec 26, 2023

While at it, I made some small adjustments to logging here and here - I have removed some redundant logging. This error is logged when the SelfCheck is not connected to the Storage\Database yet (meaning that until Storage\Database are Ready, the operator spams logs like no tomorrow). I have simply reduced logging the same thing 3 times to logging it once, a bit reducing the spam factor.

@Jorres Jorres merged commit 1565b2c into master Dec 29, 2023
@Jorres Jorres deleted the pause-resume-YDBOPS-8460 branch December 29, 2023 12:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants