New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes BZ-2123209: Real time VMs fail to change vCPU scheduler and priority in non-root deployments #8750
Fixes BZ-2123209: Real time VMs fail to change vCPU scheduler and priority in non-root deployments #8750
Conversation
8e2ae88
to
e47f95f
Compare
e47f95f
to
092ea36
Compare
092ea36
to
ed8058e
Compare
ed8058e
to
efb6134
Compare
…e <cpusched> knobs when running in a non-root environment or the VM labeled as non-root Signed-off-by: Jordi Gil <jgil@redhat.com>
… would fail during the manifest validation Signed-off-by: Jordi Gil <jgil@redhat.com>
…t environment Signed-off-by: Jordi Gil <jgil@redhat.com>
efb6134
to
315a5fb
Compare
I'm unsure whether to keep the old logic or not. Preferably it would be better to stick to one logic (this PR's implementation for both non-root and root use cases), but I'd like your input on this. |
Does this mean that with your change we could drop CAP_SYS_NICE for root and non-root? If the logic you introduce works for both cases, yep let's converge :) |
} | ||
log.Log.V(5).Object(vmi).Infof("set process %+v memlock rlimits to: Cur: %[2]d Max:%[2]d", | ||
qemuProcess, memlockSize.Value()) | ||
|
||
return nil | ||
} | ||
|
||
// GetQEMUProcess encapsulates and exposes the logic to retrieve the QEMU process ID | ||
func GetQEMUProcess(parentPID int) (ps.Process, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks to me like this should be more of a method on the isolation result, than a general purpose method where you pass in an arbitrary pid?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. I will move this to be a func method to IsolationResult
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Field: field.Child("domain", "cpu", "dedicatedCpuPlacement").String(), | ||
}) | ||
} | ||
// if nonroot { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left over
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops.. good catch! 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work @jordigilh! +1!
Left mostly nits and small comments
pkg/virt-handler/realtime.go
Outdated
type SchedParam C.sched_param | ||
type Policy uint32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can these be unexposed?
type SchedParam C.sched_param | |
type Policy uint32 | |
type schedParam C.sched_param | |
type policy uint32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely!. I will change their scope 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
pkg/virt-handler/realtime.go
Outdated
// except that in this case it uses a map[string]maskType instead of a bit array. | ||
func parseCPUMask(mask string) (map[string]maskType, error) { | ||
|
||
if len(strings.TrimSpace(mask)) == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
if len(strings.TrimSpace(mask)) == 0 { | |
if strings.TrimSpace(mask) != "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the code I should return an empty map if the mask is either empty or full of spaces. Only return error if the parsing fails.
vcpus := make(map[string]maskType)
if strings.TrimSpace(mask) == "" {
return vcpus, nil
}
The reason is that mask
is a string and will never be nil, so it should always treat the empty case as default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed!
Just one note, you can return a nil
map instead of an initialized one. It would save the space of allocating an empty map. isRealtimeVCPU()
will work with that as the len
of a nil map is 0.
if strings.TrimSpace(mask) == "" {
return nil, nil
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great, thanks for the suggestion! 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After further thought I think a mask filled with spaces should be considered invalid, compared with an empty string.
if len(mask) == 0 {
return nil, nil
}
So basically:
Entry("Empty mask", "", nil, nil),
Entry("Empty mask with spaces", " ", nil, fmt.Errorf("invalid mask value ' ' in ' '")),
WDYT?
if startID < 0 { | ||
return nil, fmt.Errorf("invalid vcpu mask start index `%d`", startID) | ||
} | ||
if endID < 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can endID
be zero? Maybe switch with endID <= 0
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, good point. I will add a test case to cover this 0-0
range as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
after adding the test it seems to be working already as it is:
Entry("Correctly extracts a single range with one core", "0-0", map[string]maskType{"0": enabled}, nil),
Are you OK to leave the logic as it is or you'd feel more comfortable changing it as you initially suggested?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting.
Well, as long as it works + covered with a unit test I guess we're fine :)
pkg/virt-handler/realtime.go
Outdated
func schedSetScheduler(pid int, policy Policy, param SchedParam) error { | ||
_, _, e1 := unix.Syscall(unix.SYS_SCHED_SETSCHEDULER, uintptr(pid), uintptr(policy), uintptr(unsafe.Pointer(¶m))) | ||
if e1 != 0 { | ||
return syscall.Errno(e1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: cast is unnecessary
return syscall.Errno(e1) | |
return e1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
pkg/virt-handler/realtime_test.go
Outdated
. "github.com/onsi/gomega" | ||
) | ||
|
||
var _ = Describe("Set pthread scheduling type and priority", func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I suggest to have one Describe
with different Context
s inside so it would look a bit clearer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not following you here. This Describe
section has only 5 entries/tests to run in the same context. Are you talking in general or just to this Describe
section?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, I'm not sure why we need 3 deffierent Describe
s, each with a Context
and a DescribeTable
. I'd go with one Describe
and 3 Context
s.
But of course it's a styling manner and might be subjective. Feel free to ignore if you don't feel the same as I :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, makes sense to me. I will migrate the Describes into Context and add a top Describe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, please take a look.
|
||
var _ = Describe("Set pthread scheduling type and priority", func() { | ||
Context("when parsing the thread command for CPU ID", func() { | ||
DescribeTable("extracts the CPU ID", func(comm []byte, cpuID string, parseOK bool) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nits:
- would change
comm []byte
tocomm string
and cast inside cpuID
->expectedCpuID
,parseOK
->expectParseOK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one thing is that I need it to be []byte
inside the function:
kubevirt/pkg/virt-handler/realtime.go
Line 100 in 315a5fb
v := vcpuRegex.FindSubmatch(comm) |
So casting it to string when the source is already a byte array and then recasting it to []byte doesn't sound that good. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's a good point.
Terrific ... I will remove the old logic 😄 |
9b4c891
to
ada476f
Compare
@jordigilh seems like a bazel file isn't configured correctly. |
ada476f
to
29f371c
Compare
Signed-off-by: Jordi Gil <jgil@redhat.com>
29f371c
to
f365f1d
Compare
/lgtm thanks @jordigilh! |
/test pull-kubevirt-e2e-k8s-1.24-sig-compute-realtime |
/test pull-kubevirt-e2e-k8s-1.24-sig-compute-realtime-root |
/retest |
3 similar comments
/retest |
/retest |
/retest |
/retest-required |
/retest |
Hello there, Unfortunately, this PR seems to have broken my local setup from building/running the e2e test binary. Sadly, this in turn exposed a nasty dependency from the e2e tests to the package here. We seem to have several issues we better fix soon:
I will try to send a quick fix later today, but it is a small workaround, not fixing the core problem. |
For completeness, the errors seen while building the e2e tests are these:
|
/cherry-pick release-0.58 |
@xpivarc: #8750 failed to apply on top of branch "release-0.58":
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
This PR addresses an issue found during the deployment of real time workloads in non-root deployments where libvirt was unable to change the scheduling and priorities of the vCPU threads due to lack of CAP_SYS_NICE.
The solution proposed here is to re-implement the logic that changes the scheduling and priority in
virt-handler
for non-root deployments. The old logic that adds the libvirt XML knobs remains there for root based deployments.Release note: