Added penetration tests important to Telco partners/customers#31283
Added penetration tests important to Telco partners/customers#31283yogeshahiray wants to merge 11 commits into
Conversation
|
Pipeline controller notification For optional jobs, comment This repository is configured in: automatic mode |
|
Hi @yogeshahiray. Thanks for your PR. I'm waiting for a openshift member to verify that this patch is reasonable to test. If it is, they should reply with Regular contributors should join the org to skip this step. Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds a comprehensive security penetration test suite with 13 security checks (credential scanning, CNI/SELinux validation, privilege/permission audits, operator/monitoring health, and risk inventories), introduces probe-level termination grace period testing with event timing validation, and includes utility refactorings for image mirror deduplication and event pattern matching. ChangesSecurity Penetration Test Suite
Probe-Level Termination Grace Period Testing
Utility and Configuration Updates
Sequence Diagram(s)sequenceDiagram
participant PenetrationSuite as Penetration Test Suite
participant OcDebug as oc debug
participant Node as Node Filesystem
participant APIServer as Kubernetes API
participant SecretStore as Secrets API
PenetrationSuite->>SecretStore: Collect passwords from kubeadmin & openshift-machine-api
SecretStore-->>PenetrationSuite: Password values
PenetrationSuite->>OcDebug: Start debug pod on node
OcDebug->>Node: chroot to root, grep logs/configs
Node-->>OcDebug: Grep results
OcDebug-->>PenetrationSuite: Plaintext findings
PenetrationSuite->>APIServer: Query Routes, etcd config, Secrets
APIServer-->>PenetrationSuite: TLS, privileges, registries
PenetrationSuite->>OcDebug: Inspect sudoers, etcd permissions
OcDebug->>Node: find/ls commands
Node-->>OcDebug: File results
OcDebug-->>PenetrationSuite: Hardening violations
sequenceDiagram
participant ProbeTest as Probe Termination Test
participant TestPod as Pod with Probe
participant EventAPI as Events API
participant KubeletLifecycle as Kubelet Lifecycle
ProbeTest->>TestPod: Create pod with probe-level terminationGracePeriodSeconds
TestPod->>KubeletLifecycle: Probe fails (liveness/startup)
KubeletLifecycle->>EventAPI: Record "Killing" event at timestamp T1
KubeletLifecycle->>EventAPI: Record "Started" event at timestamp T2
ProbeTest->>EventAPI: Poll for Killing event (reason: "Killing")
EventAPI-->>ProbeTest: Killing event @ T1
ProbeTest->>EventAPI: Poll for Started event (after T1)
EventAPI-->>ProbeTest: Started event @ T2
ProbeTest->>ProbeTest: Calculate T2 - T1, assert within tolerance
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 12 | ❌ 3❌ Failed checks (3 warnings)
✅ Passed checks (12 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 7
🧹 Nitpick comments (1)
test/extended/security/penetration.go (1)
282-286: 💤 Low valueConsider making the regex more specific.
The pattern
drwxr[A-Za-z0-9\s\.\-]+(/usr/[a-z0-9/]+)is quite permissive in the first part. While the captured path group[a-z0-9/]+safely prevents injection, the leading pattern could match unintended output. Consider making it more specific to match only expectedls -ldpermission strings.📝 Example of more specific pattern
- re := regexp.MustCompile(`drwxr[A-Za-z0-9\s\.\-]+(/usr/[a-z0-9/]+)`) + // Match: drwxr-xr-x. <number> <user> <group> ... /usr/... + re := regexp.MustCompile(`^drwxr-?x?r-?x?[.-]\s+\d+\s+\S+\s+\S+\s+.*?(/usr/libexec/cni|/opt/cni)`)This more precisely matches the
ls -ldoutput format and explicitly looks for the expected CNI paths.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/extended/security/penetration.go` around lines 282 - 286, The current regex assigned to re is too permissive; narrow it to match a precise ls -ld permission line and restrict the captured path to expected CNI locations. Update the pattern used by re (the string `drwxr[A-Za-z0-9\s\.\-]+(/usr/[a-z0-9/]+)`) to explicitly match permission bits (e.g., ^d[rwx-]{9}), whitespace-separated columns (inode/owner/group/size/date), then capture only known /usr subpaths (for example /usr/bin, /usr/lib, /usr/libexec, etc.) with a conservative character class for the path; keep the code flow using matches := re.FindStringSubmatch(output) and the existing conditional that returns matches[1] when present.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@test/extended/security/penetration.go`:
- Around line 234-236: The testPasswords slice is empty so the YAML-password
scanning never runs; populate testPasswords with actual values (either load from
the cluster configuration or supply a sensible default list of common/test
passwords) and reuse it in the YAML check function (the slice named
testPasswords used in the YAML scan) and in checkLogsForPasswords so both checks
operate on the same non-empty password list; update the code that builds
testPasswords and ensure both the YAML scanner and checkLogsForPasswords
reference that populated slice.
- Around line 192-194: The test currently defines an empty slice testPasswords
which prevents the nested verification loops in the penetration test from
running; either populate testPasswords with representative password strings used
in the cluster (e.g., common service/account passwords or test fixtures) so the
loops in the test exercise log-scanning, or explicitly skip the test when it’s
not configured by calling e2eskipper.Skipf (or the existing skip helper) early
in the test; update the code paths around testPasswords and the test function in
test/extended/security/penetration.go (the loops that scan logs) to use the
populated slice or to return after Skipf so the test no longer silently passes.
- Around line 627-643: The current verifyEtcdUsesTLS logic inspects etcd.Object
by converting spec and status maps to strings (fmt.Sprintf("%v", spec/status))
and substring-searching for "cert"/"tls", which is fragile; replace this with
structured checks: retrieve spec and status via unstructured.NestedMap (as you
already do), then explicitly inspect known TLS-related keys (e.g., spec["tls"],
spec["clientTLS"], spec["serverTLS"], spec["peerTLS"], spec["backup"] or the
etcd-operator fields like "tls", "tlsClientConfig", "certFile", "keyFile",
"caFile") and iterate nested maps/slices to detect boolean flags or presence of
certificate file fields; update verifyEtcdUsesTLS to return true only when those
concrete fields exist/are enabled rather than on loose substring matches of
fmt.Sprintf output.
- Line 54: Before calling findCNIPath(oc, nodes.Items[0].Name) add a defensive
check that nodes.Items is non-empty; if len(nodes.Items) == 0, handle the case
(return an error from the surrounding test function or log and fail the test)
instead of indexing [0]. Locate the call to findCNIPath and the variable nodes
in the same test function in test/extended/security/penetration.go and add the
guard so the code never dereferences nodes.Items[0] when the node list is empty.
- Around line 247-253: The code builds a shell command via fmt.Sprintf("grep -nl
'%s' %s") (cmd) and then runs it through "/bin/bash -c", which allows command
injection; change the call in this block so you do not invoke a shell or
interpolate pwd into a single-quoted string. Replace the Sprintf/"/bin/bash -c"
approach by passing grep and its arguments directly to
oc.AsAdmin().Run("debug").Args (e.g., use "/bin/grep", "-nl", pwd, yamlPath) so
pwd and yamlPath are separate Args instead of being concatenated into cmd;
update the code around the cmd variable and the oc.AsAdmin().Run("debug").Args
invocation to use these explicit args.
- Around line 91-92: The shared ctx := context.Background() declared at the
Describe-level is reused across multiple It test cases (the It blocks such as
"TestEtcdBackupEncryptionAndRestriction
[apigroup:config.openshift.io][apigroup:operator.openshift.io]" and the
subsequent It blocks) which breaks test isolation; remove the top-level ctx
declaration and instead add a fresh ctx := context.Background() as the first
statement inside each It block that currently references ctx (all It blocks
between lines 92-181), ensuring each test case obtains its own context.
- Around line 211-217: The grep command is vulnerable to shell injection because
pwd is interpolated into a bash -c string; update the
oc.AsAdmin().Run("debug").Args(...).Output() invocation that builds cmd so it
does not pass an interpolated shell string. Fix by invoking grep without using
"/bin/bash -c" and pass the pattern and logPath as separate arguments (use pwd
and logPath as raw args), or if shell usage is unavoidable, escape/quote pwd
with a robust shell-escaping utility (e.g., use a library that safely quotes
shell arguments) before formatting into the command; ensure the change targets
the code building cmd and the oc.AsAdmin().Run(...).Args call that executes it.
---
Nitpick comments:
In `@test/extended/security/penetration.go`:
- Around line 282-286: The current regex assigned to re is too permissive;
narrow it to match a precise ls -ld permission line and restrict the captured
path to expected CNI locations. Update the pattern used by re (the string
`drwxr[A-Za-z0-9\s\.\-]+(/usr/[a-z0-9/]+)`) to explicitly match permission bits
(e.g., ^d[rwx-]{9}), whitespace-separated columns (inode/owner/group/size/date),
then capture only known /usr subpaths (for example /usr/bin, /usr/lib,
/usr/libexec, etc.) with a conservative character class for the path; keep the
code flow using matches := re.FindStringSubmatch(output) and the existing
conditional that returns matches[1] when present.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 7cc44443-f610-468e-a69f-5d3eae92b3dc
📒 Files selected for processing (1)
test/extended/security/penetration.go
|
/test e2e-aws-ovn-microshift |
|
/payload-job periodic-ci-openshift-hypershift-release-5.0-periodics-e2e-aws-ovn-conformance |
|
@neisw: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/39651d60-6590-11f1-8635-10ef9a19d878-0 |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (5)
test/extended/security/penetration.go (5)
124-131:⚠️ Potential issue | 🟠 Major | ⚡ Quick winSampling only
masterNodes.Items[0]leaves master-local etcd permission issues undetected.Both specs validate a single control-plane node, but backup artifacts and
/var/lib/etcdpermissions can drift per node. If the first master is clean and another is not, these tests still pass. Iterate over all listed masters before asserting success.Also applies to: 157-164
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/extended/security/penetration.go` around lines 124 - 131, The test only checks masterNodes.Items[0] which can miss per-node etcd permission issues; update the code that calls findWorldReadableCriticalEtcdFiles(oc, masterNodes.Items[0].Name) to iterate over all masterNodes.Items, call findWorldReadableCriticalEtcdFiles for each node (using masterNodes.Items[i].Name), collect/merge any returned critical files, and assert the aggregated result is empty (or assert per-node emptiness) using the existing o.Expect checks; apply the same iteration/fix to the duplicate check that uses findWorldReadableCriticalEtcdFiles later in the file.
328-350:⚠️ Potential issue | 🟠 Major | ⚡ Quick win
findCNIPathcan skip valid nodes and can never correctly return/opt/cni.
ls -ld /opt/cni /usr/libexec/cniexits non-zero when either path is missing, so a node with only one valid CNI location hits Line 336 and the test skips. Even when parsing succeeds, the regex only matches/usr/..., and the fallback hardcodes/usr/libexec/cni, so/opt/cniis effectively unreachable. Probe each candidate independently and return the one that actually exists.
252-279:⚠️ Potential issue | 🟠 Major | ⚡ Quick winLiteral
*.log/*.yamlarguments mean most of these grep scans never touch real files.Passing
/var/log/containers/*.logor/etc/kubernetes/manifests/*.yamlas raw args to/bin/grepdoes not expand the glob, sogrepsees a literal filename and returns an error. The currenterr == nilgate then treats that execution failure as “nothing found”, which is another false-clean result. Enumerate matching files first and search them with literal matching (grep -F --), rather than handing wildcard patterns straight togrep.Also applies to: 298-319
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/extended/security/penetration.go` around lines 252 - 279, The grep calls loop over logPaths (e.g., "/var/log/containers/*.log") with oc.AsAdmin().Run("debug") but pass unexpanded globs to /bin/grep, so grep sees literal filenames and errors are treated as "not found"; fix by enumerating matching files on the node before grepping (e.g., run a safe listing like ls -1 or find via oc debug for each pattern), then invoke /bin/grep -F -- <pwd> <eachFoundFile> (literal matching and explicit file args) and append to foundPasswords only when grep succeeds; apply the same change to the similar block referenced at 298-319.
466-475:⚠️ Potential issue | 🟠 Major | ⚡ Quick winA node that cannot be inspected is currently treated as “no unexpected sudoers files”.
continueon the debug-command error drops that node from the result set, so this spec can pass even when one of the nodes was never checked. Surface the error, or at least record the node as an inspection failure instead of silently skipping it. As per coding guidelines, Go code should never ignore error returns.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/extended/security/penetration.go` around lines 466 - 475, The code currently ignores errors from oc.AsAdmin().Run("debug").Args(...).Output() (the block using node.Name and cmd) by doing if err != nil { continue }, which silently skips nodes; fix this by surfacing the error or recording the node as an inspection failure instead of continuing. Replace the continue with error handling that logs or returns the error (include node.Name and err in the message) or append node.Name to a failures slice (e.g., failedNodes) so the spec can fail or assert on failedNodes after the loop; ensure you handle the Output() error from oc.AsAdmin().Run("debug").Args(...) rather than ignoring it.Source: Coding guidelines
447-455:⚠️ Potential issue | 🟠 Major | ⚡ Quick winPrivileged init and ephemeral containers bypass this check.
This only inspects
pod.Spec.Containers. A privilegedinitContainerorephemeralContaineris just as relevant for a penetration test and will currently be missed, so the spec can pass while a namespace still runs privileged workload.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/extended/security/penetration.go` around lines 447 - 455, The current check only iterates pod.Spec.Containers and misses privileged init or ephemeral containers; update the logic that builds privilegedPods to also iterate pod.Spec.InitContainers and pod.Spec.EphemeralContainers and perform the same nil and *SecurityContext.Privileged checks for each element, appending fmt.Sprintf("%s/%s", pod.Namespace, pod.Name) to privilegedPods and breaking once any privileged container (regular, init, or ephemeral) is found; ensure you reference the same variable names (pod, privilegedPods) and mirror the existing condition used for pod.Spec.Containers.
♻️ Duplicate comments (1)
test/extended/security/penetration.go (1)
243-249:⚠️ Potential issue | 🟠 Major | ⚡ Quick winThese plaintext-password tests still pass without scanning anything.
Both helpers return an empty result when
testPasswordsis empty, and the callers assert that emptiness as success. That makes the suite report a clean result even though no password exposure check ran. Either load real inputs or explicitlySkipf/fail the spec until the source of passwords exists.Also applies to: 289-295
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/extended/security/penetration.go` around lines 243 - 249, The plaintext-password check currently treats an empty testPasswords slice as a successful pass; update the logic in the penetration test to not silently succeed when no inputs exist: in the block where testPasswords is declared and where the helper returns early (the code around the testPasswords variable and the function that returns foundPasswords), if testPasswords is empty call t.Skipf (or t.Fatalf if you prefer a hard failure) with an explanatory message indicating passwords are not configured, or load real test inputs from cluster config; apply the same change to the similar block around lines 289-295 so the spec is skipped/failed instead of passing when no passwords are provided, referencing the testPasswords variable and the helper that returns foundPasswords.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@test/extended/security/penetration.go`:
- Around line 124-131: The test only checks masterNodes.Items[0] which can miss
per-node etcd permission issues; update the code that calls
findWorldReadableCriticalEtcdFiles(oc, masterNodes.Items[0].Name) to iterate
over all masterNodes.Items, call findWorldReadableCriticalEtcdFiles for each
node (using masterNodes.Items[i].Name), collect/merge any returned critical
files, and assert the aggregated result is empty (or assert per-node emptiness)
using the existing o.Expect checks; apply the same iteration/fix to the
duplicate check that uses findWorldReadableCriticalEtcdFiles later in the file.
- Around line 252-279: The grep calls loop over logPaths (e.g.,
"/var/log/containers/*.log") with oc.AsAdmin().Run("debug") but pass unexpanded
globs to /bin/grep, so grep sees literal filenames and errors are treated as
"not found"; fix by enumerating matching files on the node before grepping
(e.g., run a safe listing like ls -1 or find via oc debug for each pattern),
then invoke /bin/grep -F -- <pwd> <eachFoundFile> (literal matching and explicit
file args) and append to foundPasswords only when grep succeeds; apply the same
change to the similar block referenced at 298-319.
- Around line 466-475: The code currently ignores errors from
oc.AsAdmin().Run("debug").Args(...).Output() (the block using node.Name and cmd)
by doing if err != nil { continue }, which silently skips nodes; fix this by
surfacing the error or recording the node as an inspection failure instead of
continuing. Replace the continue with error handling that logs or returns the
error (include node.Name and err in the message) or append node.Name to a
failures slice (e.g., failedNodes) so the spec can fail or assert on failedNodes
after the loop; ensure you handle the Output() error from
oc.AsAdmin().Run("debug").Args(...) rather than ignoring it.
- Around line 447-455: The current check only iterates pod.Spec.Containers and
misses privileged init or ephemeral containers; update the logic that builds
privilegedPods to also iterate pod.Spec.InitContainers and
pod.Spec.EphemeralContainers and perform the same nil and
*SecurityContext.Privileged checks for each element, appending
fmt.Sprintf("%s/%s", pod.Namespace, pod.Name) to privilegedPods and breaking
once any privileged container (regular, init, or ephemeral) is found; ensure you
reference the same variable names (pod, privilegedPods) and mirror the existing
condition used for pod.Spec.Containers.
---
Duplicate comments:
In `@test/extended/security/penetration.go`:
- Around line 243-249: The plaintext-password check currently treats an empty
testPasswords slice as a successful pass; update the logic in the penetration
test to not silently succeed when no inputs exist: in the block where
testPasswords is declared and where the helper returns early (the code around
the testPasswords variable and the function that returns foundPasswords), if
testPasswords is empty call t.Skipf (or t.Fatalf if you prefer a hard failure)
with an explanatory message indicating passwords are not configured, or load
real test inputs from cluster config; apply the same change to the similar block
around lines 289-295 so the spec is skipped/failed instead of passing when no
passwords are provided, referencing the testPasswords variable and the helper
that returns foundPasswords.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 4881a8d4-aec9-48cc-ab99-4e0c9b6f870b
📒 Files selected for processing (1)
test/extended/security/penetration.go
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
test/extended/security/penetration.go (3)
267-272:⚠️ Potential issue | 🟠 Major | ⚡ Quick winRestore wildcard expansion and literal matching in the password scans.
Passing
*.logand*.yamlpaths as rawgreparguments means those wildcards are never expanded, so these checks usually return exit code 2 and silently miss the files they are supposed to scan.grepis also treatingpwdas a regex, so passwords containing regex metacharacters can produce false positives or misses. Expand the file list separately and search withgrep -F --so the scans stay both safe and functional.Suggested fix
- output, err := oc.AsAdmin().Run("debug").Args( - fmt.Sprintf("node/%s", node.Name), - "--", - "/bin/grep", "-nl", pwd, logPath, - ).Output() + output, err := oc.AsAdmin().Run("debug").Args( + fmt.Sprintf("node/%s", node.Name), + "--", + "/bin/sh", "-c", + `for f in $1; do [ -e "$f" ] && /bin/grep -F -nl -- "$2" "$f"; done`, + "_", logPath, pwd, + ).Output()Apply the same pattern to the YAML scan as well.
As per coding guidelines, "Command: no shell=True, os.system, or backtick exec with user input".
Also applies to: 307-312
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/extended/security/penetration.go` around lines 267 - 272, The grep invocation in oc.AsAdmin().Run("debug").Args (where you pass pwd and logPath) is feeding unexpanded wildcards and a regex pattern to grep; change it to first expand the target file globs (e.g., expand logPath like "*.log" and the YAML glob into an explicit list of files) on the node before invoking grep, then call grep with fixed-string mode and explicit end-of-options (use "-F" and "--") so the password string (pwd) is treated literally; update the same pattern for the YAML scan too (the other oc.AsAdmin().Run("debug").Args site mentioned) so both scans expand globs and use grep -F -- with expanded file lists.Source: Coding guidelines
725-731:⚠️ Potential issue | 🟡 Minor | ⚡ Quick win
countDatabasePodscan count the same pod more than once.The
breakonly exits thedbImagesloop. If a pod has two matching containers,countis incremented twice even though the helper comment and log message both describe a pod count. Use a labeledcontinueor a per-pod boolean so each pod contributes at most once.Suggested fix
- for _, pod := range pods.Items { +podLoop: + for _, pod := range pods.Items { for _, container := range pod.Spec.Containers { lowerImage := strings.ToLower(container.Image) for _, dbType := range dbImages { if strings.Contains(lowerImage, dbType) { count++ - break // Count each pod only once + continue podLoop } } } }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/extended/security/penetration.go` around lines 725 - 731, The loop in countDatabasePods is incrementing count once per matching container because the break only exits the dbImages loop; change the logic so each pod increments count at most once by introducing a per-pod boolean (e.g., found) that is set when any container image matches dbImages and then break out of the container loop or use a labeled continue to skip to the next pod; update the use of pods.Items, pod.Spec.Containers, dbImages and count accordingly so the helper truly counts pods, not containers.
451-456:⚠️ Potential issue | 🟠 MajorPrivileged-pod test misses privileged init/ephemeral containers
countPrivilegedPodsInUserNamespacesonly checkspod.Spec.Containers[*].SecurityContext.Privileged(lines 451-458), so pods with privilegedinitContainersorephemeralContainerscan evadeTestNoUnexpectedPrivilegedPods. Update the helper to scan those container types as well, while still counting each pod only once.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/extended/security/penetration.go` around lines 451 - 456, The helper countPrivilegedPodsInUserNamespaces currently only inspects pod.Spec.Containers for SecurityContext.Privileged; update it to also iterate pod.Spec.InitContainers and pod.Spec.EphemeralContainers and check each container's SecurityContext != nil and SecurityContext.Privileged != nil && *SecurityContext.Privileged, incrementing count and breaking out once per pod (same behavior as for pod.Spec.Containers) so pods with privileged init or ephemeral containers are counted exactly once; reuse the existing loop/break pattern and nil checks to avoid panics.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@test/extended/security/penetration.go`:
- Around line 194-200: The tests named TestNoUnprotectedDatabasePods,
TestNoUnexpectedClusterAdminServiceAccounts, and TestNoNFSVolumesRisk currently
only log via g.By(...) when a non-zero risk is detected, so CI doesn't fail;
replace the informational g.By(...) calls with explicit assertions (e.g., use
g.Expect(dbPodCount).To(BeZero(), "found %d database pod(s) - verify credentials
use Secrets", dbPodCount)) so the test fails when risk is detected; update the
occurrences around TestNoUnprotectedDatabasePods (calling countDatabasePods),
TestNoUnexpectedClusterAdminServiceAccounts (calling countClusterAdminSAs or
similar), and TestNoNFSVolumesRisk (calling countNFSVolumes or similar) to
assert counts are zero (or rename tests to make them discovery-only if you
prefer a non-failing check).
---
Outside diff comments:
In `@test/extended/security/penetration.go`:
- Around line 267-272: The grep invocation in oc.AsAdmin().Run("debug").Args
(where you pass pwd and logPath) is feeding unexpanded wildcards and a regex
pattern to grep; change it to first expand the target file globs (e.g., expand
logPath like "*.log" and the YAML glob into an explicit list of files) on the
node before invoking grep, then call grep with fixed-string mode and explicit
end-of-options (use "-F" and "--") so the password string (pwd) is treated
literally; update the same pattern for the YAML scan too (the other
oc.AsAdmin().Run("debug").Args site mentioned) so both scans expand globs and
use grep -F -- with expanded file lists.
- Around line 725-731: The loop in countDatabasePods is incrementing count once
per matching container because the break only exits the dbImages loop; change
the logic so each pod increments count at most once by introducing a per-pod
boolean (e.g., found) that is set when any container image matches dbImages and
then break out of the container loop or use a labeled continue to skip to the
next pod; update the use of pods.Items, pod.Spec.Containers, dbImages and count
accordingly so the helper truly counts pods, not containers.
- Around line 451-456: The helper countPrivilegedPodsInUserNamespaces currently
only inspects pod.Spec.Containers for SecurityContext.Privileged; update it to
also iterate pod.Spec.InitContainers and pod.Spec.EphemeralContainers and check
each container's SecurityContext != nil and SecurityContext.Privileged != nil &&
*SecurityContext.Privileged, incrementing count and breaking out once per pod
(same behavior as for pod.Spec.Containers) so pods with privileged init or
ephemeral containers are counted exactly once; reuse the existing loop/break
pattern and nil checks to avoid panics.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 275ca5b4-bce4-4c29-86bf-7965200fb8e3
📒 Files selected for processing (1)
test/extended/security/penetration.go
|
I have addressed all the comments |
|
/ok-to-test |
|
Scheduling required tests: |
|
Risk analysis has seen new tests most likely introduced by this PR. New Test Risks for sha: 4549d1c
New tests seen in this PR at sha: 4549d1c
|
|
Scheduling required tests: |
1 similar comment
|
Scheduling required tests: |
|
Job Failure Risk Analysis for sha: da2b185
Risk analysis has seen new tests most likely introduced by this PR. New Test Risks for sha: da2b185
New tests seen in this PR at sha: da2b185
|
|
/approve |
|
/payload-job periodic-ci-openshift-hypershift-release-5.0-periodics-e2e-aws-ovn-conformance |
|
@neisw: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/d67d8e10-68e8-11f1-8327-82313cf86a27-0 |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/cmd/openshift-tests/images/images_command.go`:
- Line 105: The prefix handling is inconsistent between where repository
prefixes are stored and where they are used for deduplication and output. At
line 318, prefixed targets are stored from existing lines, but when building
dest and printing outputs in lines 324-333, the prefix is not being applied
consistently, causing the dedupe logic at line 325 to fail to match prefixed and
unprefixed targets. Fix this by ensuring that when building dest at lines 324+
and when printing targets at lines 329-333, you apply the same prefix handling
logic that is used when storing targets from existing lines at line 318, so that
deduplication can work correctly and prevent duplicate mappings for file:// and
s3:// repositories.
In `@test/extended/ci/job_names.go`:
- Around line 176-177: The Skipf call on line 177 runs unconditionally, causing
the os-version validation to be skipped for all job variants rather than only
those affected by the RHEL-10 switchover. Add a conditional check before the
Skipf call that evaluates whether the current job is one of the specific
variants impacted by the RHEL-10 switchover, and only execute the Skipf when
that condition is true, allowing the validation to continue for all other
unaffected job variants.
In `@test/extended/node/node_e2e/probe_termination.go`:
- Around line 202-224: The functions findLatestEventByReason and
findEarliestEventAfter use only LastTimestamp.Time for timestamp comparisons,
but the existing CalculateEventTimeDiff function already implements fallback
logic to use FirstTimestamp when LastTimestamp is zero. This inconsistency
causes a bug where zero LastTimestamp values result in epoch time boundaries
being used, allowing findEarliestEventAfter to incorrectly match events. Apply
the same unified timestamp fallback logic from CalculateEventTimeDiff to both
findLatestEventByReason and findEarliestEventAfter so that whenever
LastTimestamp.Time is zero, they fall back to FirstTimestamp.Time for comparison
operations, ensuring consistent and correct timestamp handling across all event
filtering and boundary selection.
In `@test/extended/node/node_utils.go`:
- Around line 773-782: The CalculateEventTimeDiff function does not account for
the EventTime field which is used in newer Kubernetes event objects. Modify the
function to add a fallback mechanism: after checking LastTimestamp and
FirstTimestamp for both startEvent and endEvent, if they are still zero, fall
back to using the EventTime field from the respective event objects. This
ensures the function correctly handles both legacy and newer Kubernetes event
formats and returns a valid duration instead of a zero duration when legacy
timestamp fields are not populated.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 27d79215-2299-4504-921b-3d249a62f905
⛔ Files ignored due to path filters (1)
test/extended/util/image/zz_generated.txtis excluded by!**/zz_generated*
📒 Files selected for processing (10)
pkg/cmd/openshift-tests/images/images_command.gopkg/monitortestlibrary/pathologicaleventlibrary/duplicated_event_patterns.gotest/extended/ci/job_names.gotest/extended/internalreleaseimage/helper.gotest/extended/node/README.mdtest/extended/node/node_e2e/probe_termination.gotest/extended/node/node_utils.gotest/extended/security/penetration.gotest/extended/util/image/OWNERStest/extended/util/image/image.go
✅ Files skipped from review due to trivial changes (2)
- test/extended/util/image/OWNERS
- pkg/monitortestlibrary/pathologicaleventlibrary/duplicated_event_patterns.go
🚧 Files skipped from review as they are similar to previous changes (1)
- test/extended/security/penetration.go
|
|
||
| // TODO(k8s-1.36): remove this when k8s 1.36 lands | ||
| injectedLines := injectNewImages(ref, !o.Upstream) | ||
| injectedLines := injectNewImages(ref, !o.Upstream, lines) |
There was a problem hiding this comment.
Prefix handling is inconsistent, breaking dedupe/output for file:// and s3:// repositories.
Line 105 passes only lines, but Line 324+ builds dest without the repository prefix and Lines 329-333 print unprefixed targets. With prefixed repositories, Line 318 stores prefixed targets (from existing lines), so dedupe at Line 325 won’t match and duplicate mappings can be emitted.
Proposed fix
- injectedLines := injectNewImages(ref, !o.Upstream, lines)
+ injectedLines := injectNewImages(ref, !o.Upstream, prefix, lines)-func injectNewImages(ref reference.DockerImageReference, mirrored bool, existingLines []string) []string {
+func injectNewImages(ref reference.DockerImageReference, mirrored bool, prefix string, existingLines []string) []string {
target := ref.Exact()
@@
existingTargets := sets.NewString()
for _, line := range existingLines {
parts := strings.Fields(line)
if len(parts) >= 2 {
- existingTargets.Insert(parts[1])
+ existingTargets.Insert(strings.TrimPrefix(strings.TrimPrefix(parts[1], "file://"), "s3://"))
}
}
@@
- if mirrored {
+ if mirrored {
lines = append(lines, fmt.Sprintf("%s:%s %s",
- imagesetup.DefaultTestImageMirrorLocation, mirrorTag, dest))
+ imagesetup.DefaultTestImageMirrorLocation, mirrorTag, prefix+dest))
} else {
- lines = append(lines, fmt.Sprintf("%s %s", originalImage, dest))
+ lines = append(lines, fmt.Sprintf("%s %s", originalImage, prefix+dest))
}
}Also applies to: 304-333
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@pkg/cmd/openshift-tests/images/images_command.go` at line 105, The prefix
handling is inconsistent between where repository prefixes are stored and where
they are used for deduplication and output. At line 318, prefixed targets are
stored from existing lines, but when building dest and printing outputs in lines
324-333, the prefix is not being applied consistently, causing the dedupe logic
at line 325 to fail to match prefixed and unprefixed targets. Fix this by
ensuring that when building dest at lines 324+ and when printing targets at
lines 329-333, you apply the same prefix handling logic that is used when
storing targets from existing lines at line 318, so that deduplication can work
correctly and prevent duplicate mappings for file:// and s3:// repositories.
| // TODO: @pablintino https://redhat.atlassian.net/browse/MCO-2371 | ||
| e2eskipper.Skipf("Temporarily disabled until RHEL-10 switchover, see MCO-2371") |
There was a problem hiding this comment.
Unconditional skip removes this CI guardrail for all jobs.
On Line 177, Skipf(...) runs unconditionally, so the os-version validation never executes (including cases unrelated to RHEL-10 switchover). Please scope the skip to only the affected job variants so existing coverage remains active.
Suggested adjustment
g.It("should match os version", func() {
- // TODO: `@pablintino` https://redhat.atlassian.net/browse/MCO-2371
- e2eskipper.Skipf("Temporarily disabled until RHEL-10 switchover, see MCO-2371")
if jobName == "" {
e2eskipper.Skipf("JOB_NAME env var not set, skipping")
}
+ // TODO: `@pablintino` https://redhat.atlassian.net/browse/MCO-2371
+ if strings.Contains(jobName, "rhcos10") || strings.Contains(jobName, "rhcos9-10") {
+ e2eskipper.Skipf("Temporarily disabled for RHEL-10 switchover paths, see MCO-2371")
+ }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@test/extended/ci/job_names.go` around lines 176 - 177, The Skipf call on line
177 runs unconditionally, causing the os-version validation to be skipped for
all job variants rather than only those affected by the RHEL-10 switchover. Add
a conditional check before the Skipf call that evaluates whether the current job
is one of the specific variants impacted by the RHEL-10 switchover, and only
execute the Skipf when that condition is true, allowing the validation to
continue for all other unaffected job variants.
| func CalculateEventTimeDiff(startEvent, endEvent *corev1.Event) time.Duration { | ||
| startTime := startEvent.LastTimestamp.Time | ||
| if startTime.IsZero() { | ||
| startTime = startEvent.FirstTimestamp.Time | ||
| } | ||
| endTime := endEvent.LastTimestamp.Time | ||
| if endTime.IsZero() { | ||
| endTime = endEvent.FirstTimestamp.Time | ||
| } | ||
| return endTime.Sub(startTime) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's find the file and check its current state
find . -name "node_utils.go" -type f | head -20Repository: openshift/origin
Length of output: 95
🏁 Script executed:
# Search for the CalculateEventTimeDiff function to understand the implementation
rg "CalculateEventTimeDiff" -A 15 --type goRepository: openshift/origin
Length of output: 2089
🏁 Script executed:
# Check what corev1.Event looks like and if it has EventTime field
rg "type Event struct" --type go -A 30Repository: openshift/origin
Length of output: 50372
🏁 Script executed:
# Search for EventTime field specifically in the corev1.Event definition
rg "EventTime" vendor/k8s.io/api/core/v1/types.goRepository: openshift/origin
Length of output: 156
🏁 Script executed:
# Get the full corev1.Event definition more carefully
sed -n '/^type Event struct/,/^}/p' vendor/k8s.io/api/core/v1/types.go | head -60Repository: openshift/origin
Length of output: 2703
🏁 Script executed:
# Check the actual implementation at the specified line range
sed -n '765,790p' test/extended/node/node_utils.goRepository: openshift/origin
Length of output: 841
Include EventTime fallback in CalculateEventTimeDiff to handle newer Kubernetes event objects.
The helper currently ignores EventTime, which is present in corev1.Event (as metav1.MicroTime). When both LastTimestamp and FirstTimestamp are zero, the function returns an invalid zero duration, breaking event-timing assertions that rely on this helper.
Suggested fix
func CalculateEventTimeDiff(startEvent, endEvent *corev1.Event) time.Duration {
startTime := startEvent.LastTimestamp.Time
if startTime.IsZero() {
startTime = startEvent.FirstTimestamp.Time
}
+ if startTime.IsZero() {
+ startTime = startEvent.EventTime.Time
+ }
endTime := endEvent.LastTimestamp.Time
if endTime.IsZero() {
endTime = endEvent.FirstTimestamp.Time
}
+ if endTime.IsZero() {
+ endTime = endEvent.EventTime.Time
+ }
return endTime.Sub(startTime)
}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@test/extended/node/node_utils.go` around lines 773 - 782, The
CalculateEventTimeDiff function does not account for the EventTime field which
is used in newer Kubernetes event objects. Modify the function to add a fallback
mechanism: after checking LastTimestamp and FirstTimestamp for both startEvent
and endEvent, if they are still zero, fall back to using the EventTime field
from the respective event objects. This ensures the function correctly handles
both legacy and newer Kubernetes event formats and returns a valid duration
instead of a zero duration when legacy timestamp fields are not populated.
d52b732 to
4761694
Compare
4761694 to
ae2362a
Compare
|
Scheduling required tests: |
|
/lgtm |
|
@laurayang842: changing LGTM is restricted to collaborators DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: laurayang842, neisw, yogeshahiray The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@yogeshahiray: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
Job Failure Risk Analysis for sha: ae2362a
Risk analysis has seen new tests most likely introduced by this PR. New Test Risks for sha: ae2362a
New tests seen in this PR at sha: ae2362a
|
Added penetration tests which are very important to Telco partners/customers.
Summary by CodeRabbit
terminationGracePeriodSecondsbehavior (with pod-level fallback), with MicroShift test skipping.