Skip to content

Conversation

@dayeol
Copy link
Contributor

@dayeol dayeol commented Oct 30, 2024

This is a first PR for a major refactoring of DCR monitor. See #15

Warning

This PR breaks attestation functionality, because TEEProvider does not implement attestation feature yet.

Changes

  • Change Monitor to be a "Reconciler", which runs a reconciliation loop every 10 seconds. This required me to change k8s CronJob to Deployment, and have reconciliation loop in the app (cronjob only supports > 1 minute)
  • Create TEEProvider type to support different TEE backends. Only partially implemented right now.
  • Clean up unused functions (ListAllInstances, DeleteInstance)
  • Added a unit test for reconcileJob function
  • Added the bazel test //... to the CI

Testing

Manually deployed & tested. Also added a unit test for updateJobStatus
didn't test Reconcile, it needs dependency injection for the DB.

@dayeol dayeol requested a review from xxxxavier October 31, 2024 04:38
@dayeol dayeol merged commit 5f7bfba into main Nov 4, 2024
1 check passed
@dayeol dayeol deleted the dayeol-refactor-reconciler branch November 4, 2024 17:51
dayeol added a commit that referenced this pull request Nov 6, 2024
2/N PR for a major refactoring of DCR Monitor. See #15.

1/N: #16

# Changes

* Remove "UpdateJobStatus" endpoint from the API. Everything must be
handled in the reconciler instead.
* Added `imagebuilder` interface, which currently has one image builder
implementation which is Kaniko
* Moved `RunJob` logic into the reconciler. Now TEE is launched by the
reconciler via `LaunchInstance` method in `TEEProvider`.
* Removed bunch of shallow interfaces from `Config`. Removed bunch of
unnecessary global configs, and made them local.
* Completely removed KMS logic. We don't need user keys and
stage1/stage2 keys in the open source version
* Completely removed `PrepareResourcesForUser`. Any logic needed for TEE
should go to `TEEProvider`.
* Removed `project_number` from config, because it is not needed
anymore.
* Removed logic for creating/updating workload identity pool provider.
It's not used any longer so let's revisit later when we implement data
confidentiality.
  
# Caveat

* This still doesn't support getting attestation report.

# Testing

Added unit tests, and also did manual end-to-end testing.
dayeol added a commit that referenced this pull request Nov 13, 2024
2/3 PR for a major refactoring of DCR Monitor. See
#15.

1/3: #16
2/3: #19

This pretty much concludes the major refactoring, resolving #15

# Changes

* Added `Created` job status, because the reconciler needs to handle
newly created jobs.
* Move KanikoService functionality into the monitor (=reconciler)
* Remove unused config `Cluster.PodServiceAccount`
* Clean up ARG/ENV of TEE dockerfile
* In-memory Dockerfile injection in API. See `addDockerfileToTarGz`.
This resolves #3. Also Kaniko can just refer to the remote object path.
* Removed most of shallow functions in `Config`
* Completely removed `CloudProvider`
* Removed shallow / no-longer used modules in `pkg`:
`pkg/utils/constant.go`, `pkg/utils/file.go`, and `pkg/utils/k8s.go`.

# Testing

Manual end-to-end testing.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants