Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop SYS_RESOURCE capability #5558

Merged
merged 5 commits into from May 26, 2021

Conversation

ormergi
Copy link
Contributor

@ormergi ormergi commented Apr 28, 2021

What this PR does / why we need it:
Currently virt-launcher runs with SYS_RESOURCE capability on VMs with SRIOV devices when SRIOVLiveMigation feature-gate enabled.
Having this capability enables Libvirt to adjust QEMU process memory lock limits (memlock rlimit) in case its already exists
As occurs when SRIOV VM is migrated and the SRIOV host-device is re-attached to VM on target.

With Libvirt and above, Libvirt does best effort to adjusts QEMU process limits and wont try to change the limits if they are high enough.
In order to prevent using SYS_RESOURCE cap on containerized environments [1] as Kubevirt use case.

With this PR changes Kubevirt will adjust QEMU process rlimits as it does for libvirtd.
This will prevent libvirt from adjusting QEMU process rlimits (it will detect that limits are sufficient).
Then virt-launcher resources will be adjusted on post migration, in order to enable hot plugging VFIO devices

This PR also remove SYS_RESOURCE capability.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Special notes for your reviewer:
Rebased on #5328

virt-handler is privileged while virt-launcher is not.
Before this change, it was already done for the libvirtd process, the only difference here is that we now need the qemu one.

Release note:

Drop virt-launcher SYS_RESOURCE capability

@kubevirt-bot kubevirt-bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note Denotes a PR that will be considered when it comes time to generate release notes. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. labels Apr 28, 2021
@kubevirt-bot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@ormergi
Copy link
Contributor Author

ormergi commented Apr 28, 2021

/test pull-kubevirt-e2e-kind-1.17-sriov

@kubevirt-bot kubevirt-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 28, 2021
@ormergi ormergi marked this pull request as ready for review April 28, 2021 19:53
@ormergi ormergi marked this pull request as draft April 28, 2021 19:55
@ormergi
Copy link
Contributor Author

ormergi commented Apr 29, 2021

/test pull-kubevirt-e2e-kind-1.17-sriov

@ormergi
Copy link
Contributor Author

ormergi commented Apr 29, 2021

/test pull-kubevirt-e2e-kind-1.17-sriov

@kubevirt-bot kubevirt-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 29, 2021
@ormergi ormergi marked this pull request as ready for review April 29, 2021 08:46
@ormergi
Copy link
Contributor Author

ormergi commented Apr 29, 2021

/cc @EdDev

@kubevirt-bot kubevirt-bot requested a review from EdDev April 29, 2021 08:46
Copy link
Member

@EdDev EdDev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gracefully shutdown in case of virt-launcher crash [1][2].

The links are not permanent, so these will become incorrect when the code change.

_, _, errno := unix.RawSyscall6(unix.SYS_PRLIMIT64,
uintptr(pid),
limit,
uintptr(unsafe.Pointer(rlimit)), // #nosec used in unix RawSyscall6
uintptr(unix.RLIMIT_MEMLOCK), // #nosec used in unix RawSyscall6
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the comment of #nosec moved to a different command? Are you sure it is not needed now for both now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#nosec comments are added where unsafe pointers are used #4446
This comment no longer needed, I will remove it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized we still need this comment for line 145, I will just move it

@@ -98,13 +99,19 @@ func (s *socketBasedIsolationDetector) DetectForSocket(vm *v1.VirtualMachineInst

}

if process, err := ps.FindProcess(pid); err != nil {
return nil, err
} else {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is common in the code not to do this. How about:

process, err := ps.FindProcess(pid)
if err != nil {
    return ...
}
ppid := process.PPid()

Copy link
Member

@EdDev EdDev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

< With Libvirt 7 (Coper)

I do not understand how the Coper part relevant here? That is the source of the package, we mainly care about the version. If you want to be specific, give a more detailed version.

Copy link
Member

@EdDev EdDev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commit post migration: Adjust QEMU and libvirt processes memlock limits:

The change in this commit includes both logic addition and refactoring of the existing code. Please split the refactoring (and naming changes) to a separate commit so it will be easier to see what logic changed.

if err != nil {
return err
func filterVirtLaunchersLibvirtProcesses(isolationResult IsolationResult, processes []ps.Process) []ps.Process {
var filteredProcesses []ps.Process
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need this? On line 208 you can just use :=, no?

Copy link
Contributor Author

@ormergi ormergi May 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may look redundant here because there are only two assignments.
I think its easier to track where a variable was declared when there are more then one assignment (the first is always :=).

func filterVirtLaunchersLibvirtProcesses(isolationResult IsolationResult, processes []ps.Process) []ps.Process {
var filteredProcesses []ps.Process

// consider all processes that are virt-launcher children
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please explain it all as a doctext? Having inline comments is hard to keep up-to-date and read the code.


filteredProcesses = filterProcessesByParentPID(processes, []int{launcherPid, launcherParentPid})
// libvirtd process sets the memory lock limit before fork/exec-ing into qemu
// Its necessary to change QMEU process limits if its already exists (e.g: post migration just before hostdev)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we are in a reconcile? Are you sure we can reach this only on post-migration?

Copy link
Contributor Author

@ormergi ormergi May 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will reach this also from reconcile, although adjust resources is called only when VM is not running
https://github.com/kubevirt/kubevirt/blob/master/pkg/virt-handler/vm.go#L2420
https://github.com/kubevirt/kubevirt/blob/master/pkg/virt-handler/vm.go#L2446
Do we want to restrict changing QEMU process limits only on post migration?
If so I can create AdjsutResourcesOnPostMigration function
that will be called on finalizeMigration (instead of regular AdjustResources)
@EdDev What you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is less risky if the change is done per the specific need known to us. The qemu process memlock limit needs to be changed only when the VMI has a VFIO device and the state is post-migration.
If this is open to other scenarios, we may get into corner cases where we should not be.

If so I can create AdjsutResourcesOnPostMigration function

As I see it, a function name should express what it does, not who or when it is called.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed new changes so that only QEMU process will be adjusted on post migration

}
// bump libvirtd and qemu processes locked memory limits
libvirtProcesses := filterVirtLaunchersLibvirtProcesses(isolationResult, processes)
log.Log.Info("debug: adjust memlock: filtered processes:")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are some debug logs just fyi (i guess its known)

Copy link
Member

@vladikr vladikr May 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, please use logs verbosity, log.Log.V(X).Info instead of log.Log.Info("debug: - KubeVirt uses verbosity levels, so you can print different messages at various levels.

Copy link
Contributor Author

@ormergi ormergi May 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added these logs just for debugging I was going to remove them (all logs that stats with 'debug')

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe it is still valuable to keep it? up to you, but I think if it was valuable for you to debug, it might be useful when things go wrong ;)

Cur: uint64(memlockSize),
}
err = prLimit(process.Pid(), unix.RLIMIT_MEMLOCK, &rLimit)
for _, process := range libvirtProcesses {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please combine this loop with the one on line 190?

libvirtProcesses := filterVirtLaunchersLibvirtProcesses(isolationResult, processes)
log.Log.Info("debug: adjust memlock: filtered processes:")
for _, process := range libvirtProcesses {
log.Log.Infof("debug: pid %d, ppid %d, exec %s", process.Pid(), process.PPid(), process.Executable())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please make the debug prints more informative. Something like: "found libvirt process with pid %d...."

@@ -2641,6 +2641,13 @@ func (d *VirtualMachineController) finalizeMigration(vmi *v1.VirtualMachineInsta
if err != nil {
return err
}

err = d.podIsolationDetector.AdjustResources(vmi)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In reality, a failure here doesn't affect the result of migration.
The migration will still show as succeeded. However, perhaps we do need to react differently in this case?
Maybe, we should at least send an alert in case we didn't manage to plug the devices back.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You right, its wrong to return on AdjustResources error, thanks for the heads up.
I will log the error instead of returning, then in case a device is failed to plug it will be logged.

@kubevirt-bot kubevirt-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 4, 2021
@ormergi ormergi requested a review from vladikr May 24, 2021 14:48
@vladikr
Copy link
Member

vladikr commented May 26, 2021

/retest

@vladikr
Copy link
Member

vladikr commented May 26, 2021

/approve
Thanks.

@kubevirt-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: vladikr

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kubevirt-bot kubevirt-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 26, 2021
@kubevirt-bot
Copy link
Contributor

kubevirt-bot commented May 26, 2021

@ormergi: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
pull-kubevirt-e2e-k8s-1.20-sig-compute 0adf11b link /test pull-kubevirt-e2e-k8s-1.20-sig-compute

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@vladikr
Copy link
Member

vladikr commented May 26, 2021

/lgtm

@kubevirt-bot kubevirt-bot added the lgtm Indicates that a PR is ready to be merged. label May 26, 2021
@kubevirt-bot kubevirt-bot merged commit b723dac into kubevirt:master May 26, 2021
nunnatsa added a commit to nunnatsa/hyperconverged-cluster-operator that referenced this pull request Jun 16, 2021
Following kubevirt/kubevirt#5558
Kubevirt is not longer requires SYS_RESOURCE capability in order to enable SRIOV VM's migration.

We still want this feature-gate to enable the user to choose whether to enable SRIOV VM migration or not
until this feature is maturated.

Signed-off-by: Nahshon Unna-Tsameret <nunnatsa@redhat.com>
kubevirt-bot pushed a commit to kubevirt/hyperconverged-cluster-operator that referenced this pull request Jun 17, 2021
Following kubevirt/kubevirt#5558
Kubevirt is not longer requires SYS_RESOURCE capability in order to enable SRIOV VM's migration.

We still want this feature-gate to enable the user to choose whether to enable SRIOV VM migration or not
until this feature is maturated.

Signed-off-by: Nahshon Unna-Tsameret <nunnatsa@redhat.com>
@ormergi ormergi deleted the drop-sys-resource-cap branch October 10, 2021 12:34
ormergi added a commit to ormergi/kubevirt that referenced this pull request Jun 27, 2022
Following [1], it is no longer needed to add SYS_RESOURCE capability
to virt-launcher, SRIOVLiveMigration feature-gate is redundant.

SR-IOV live migration in controlled by 'LiveMigration' feature gate.

[1] kubevirt#5558

Signed-off-by: Or Mergi <ormergi@redhat.com>
ormergi added a commit to ormergi/kubevirt that referenced this pull request Jun 27, 2022
Following [1], it is no longer needed to add SYS_RESOURCE capability
to virt-launcher, SRIOVLiveMigration feature-gate is redundant.

SR-IOV live migration in controlled by 'LiveMigration' feature gate.

[1] kubevirt#5558

Signed-off-by: Or Mergi <ormergi@redhat.com>
ormergi added a commit to ormergi/kubevirt that referenced this pull request Jun 27, 2022
Following [1] virt-launcher doesn't execute with the `SYS_RESOURCE`
capability anymore, `SRIOVLiveMigration` feature-gate is redundant.

SR-IOV live migration will be controlled by `LiveMigration` feature-gate
similar to non-SR-IOV VMs.

[1] kubevirt#5558

Signed-off-by: Or Mergi <ormergi@redhat.com>
ormergi added a commit to ormergi/kubevirt that referenced this pull request Jun 27, 2022
Following [1] virt-launcher doesn't execute with the `SYS_RESOURCE`
capability anymore, `SRIOVLiveMigration` feature-gate is redundant.

SR-IOV live migration will be controlled by `LiveMigration` feature-gate
similar to non-SR-IOV VMs.

[1] kubevirt#5558

Signed-off-by: Or Mergi <ormergi@redhat.com>
ormergi added a commit to ormergi/kubevirt that referenced this pull request Jul 4, 2022
Following [1] virt-launcher doesn't execute with the `SYS_RESOURCE`
capability anymore, `SRIOVLiveMigration` feature-gate is redundant.

SR-IOV live migration will be controlled by `LiveMigration` feature-gate
similar to non-SR-IOV VMs.

[1] kubevirt#5558

Signed-off-by: Or Mergi <ormergi@redhat.com>
ormergi added a commit to ormergi/kubevirt that referenced this pull request Aug 16, 2022
Following [1] virt-launcher doesn't execute with the `SYS_RESOURCE`
capability anymore, `SRIOVLiveMigration` feature-gate is redundant.

SR-IOV live migration will be controlled by `LiveMigration` feature-gate
similar to non-SR-IOV VMs.

[1] kubevirt#5558

Signed-off-by: Or Mergi <ormergi@redhat.com>
ormergi added a commit to ormergi/kubevirt that referenced this pull request Aug 16, 2022
Following [1] virt-launcher doesn't execute with the `SYS_RESOURCE`
capability anymore, `SRIOVLiveMigration` feature-gate is redundant.

SR-IOV live migration will be controlled by `LiveMigration` feature-gate
similar to non-SR-IOV VMs.

[1] kubevirt#5558

Signed-off-by: Or Mergi <ormergi@redhat.com>
ormergi added a commit to ormergi/kubevirt that referenced this pull request Aug 21, 2022
Following [1] virt-launcher doesn't execute with the `SYS_RESOURCE`
capability anymore, `SRIOVLiveMigration` feature-gate is redundant.

[1] kubevirt#5558

Signed-off-by: Or Mergi <ormergi@redhat.com>
ormergi added a commit to ormergi/kubevirt that referenced this pull request Aug 22, 2022
Following [1] virt-launcher doesn't execute with the `SYS_RESOURCE`
capability anymore, `SRIOVLiveMigration` feature-gate is redundant.

[1] kubevirt#5558

Signed-off-by: Or Mergi <ormergi@redhat.com>
ormergi added a commit to ormergi/kubevirt that referenced this pull request Aug 22, 2022
Following [1] virt-launcher doesn't execute with the `SYS_RESOURCE`
capability anymore, `SRIOVLiveMigration` feature-gate is redundant.

[1] kubevirt#5558

Signed-off-by: Or Mergi <ormergi@redhat.com>
ormergi added a commit to ormergi/kubevirt that referenced this pull request Aug 24, 2022
Following [1] virt-launcher doesn't execute with the `SYS_RESOURCE`
capability anymore, `SRIOVLiveMigration` feature-gate is redundant.

[1] kubevirt#5558

Signed-off-by: Or Mergi <ormergi@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. lgtm Indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants