Skip to content

Locate and read cgroup files for cgroup v2 #6432

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 19, 2022

Conversation

EricYangIBM
Copy link
Contributor

@EricYangIBM EricYangIBM commented Mar 24, 2022

  • Add isCgroupV2Available to detect if cgroup v2 is available on the system.
  • Split readCgroupFile into populateCgroupEntryListV1 and populateCgroupEntryListV2,
    where the latter is added for cgroup v2 to fetch enabled subsystems from
    $MOUNT_POINT/cgroupName/cgroup.controllers.
  • Update getCgroupMemoryLimit (and its helpers) for cgroup v2: read the correct
    controller files at MOUNT_POINT/cgroupName/ and account for the possible "max"
    values in these files.
  • Add PPG_sysinfoControlFlags global to cache if cgroup v1 or v2 is available,
    or if the process is running in a container.

Issue: #1281
Signed-off-by: Eric Yang eric.yang@ibm.com

@babsingh babsingh self-assigned this Mar 24, 2022
@babsingh
Copy link
Contributor

In this PR, I see changes from #6422. All changes are in one commit. First, it won't rebase automatically; there will be conflicts. Second, the new changes cannot be distinguished from older ones. To avoid these issues, you need to add a new commit (with only new changes) on top of the commit in #6422.

@babsingh
Copy link
Contributor

Some feedback

  • readCgroupFile is invoked only once to populate PPG_cgroupEntryList from omrsysinfo_cgroup_is_system_available. Currently, it is explicitly tailored for cgroup v1.
  • Instead of modifying readCgroupFile to work with both cgroup v1 and v2, we should add a new function, a cgroup v2 variant for readCgroupFile.
  • This will keep v1 and v2 logic separated. Thus, reducing the chances to break the existing v1 code.
readCgroupFile -> populateCgroupV1EntryList { ... Old cgroup v1 logic

populateCgroupV2EntryList() { ... New cgroup v2 logic

omrsysinfo_cgroup_is_system_available() {
     ...
     if (isCgroupV1Available) : populateCgroupV1EntryList()
     else if (isCgroupV2Available) : populateCgroupV2EntryList()
     else Trace point to indicate PPG_cgroupEntryList was not populated

@EricYangIBM EricYangIBM force-pushed the cgroup2 branch 2 times, most recently from 1332c4e to 3e27238 Compare March 25, 2022 16:42
@EricYangIBM
Copy link
Contributor Author

With these changes the heap size for cgroups v2 matches that of v1 for a 4GB docker

root@63d105473c38:~/hostdir/openj9-openjdk-jdk8# build/linux-x86_64-normal-server-release/images/j2sdk-image/bin/java -XshowSettings:vm -XX:+OriginalJDK8HeapSizeCompatibilityMode -version
MM_GCExtensions::computeDefaultMaxHeapForJava usablePhysicalMemory: 4294967296
Subsystems enabled
memlimit set
memoryMax: 3221225472
VM settings:
    Max. Heap Size (Estimated): 3.00G

@EricYangIBM EricYangIBM marked this pull request as ready for review March 25, 2022 16:48
@babsingh
Copy link
Contributor

@pshipton This should fix ibmruntimes/ci.docker#124 which is linked to eclipse-openj9/openj9#14190.

Also, can you confirm if the below trace point rules still apply?

  1. New trace points should be defined at the end of the file.
  2. Old trace points should not be modified or removed.
  3. Instead, a modified version of the old trace point should be defined at the end of the file, and the usage of the old trace point should be stopped.

@EricYangIBM
Copy link
Contributor Author

Forgot about the rules for adding tracepoints. But according to https://github.com/eclipse-openj9/openj9/blob/master/doc/diagnostics/AddingTracepoints.md my modifications of old trace points is fine since they don't modify the signature of the format specifiers.

@pshipton
Copy link
Contributor

pshipton commented Mar 28, 2022

The older tracepoint names can't be modified. The original names need to be preserved so the latest build can still process older tracepoint files. It's ok to change some text (without changing signatures), as long as the the new text still makes sense in the context of processing older tracepoint files.

If a tracepoint is no longer used, you can add the Obsolete keyword to it.

@EricYangIBM
Copy link
Contributor Author

I thought that the formatter looks at the ordering of the tracepoints and not the name so the position of a tracepoint in the file determines which old and new tracepoints are matched. If this isn't the case, then maybe we should mention that names can't be changed in the documentation.

@pshipton
Copy link
Contributor

pshipton commented Mar 28, 2022

Maybe @keithc-ca knows for sure?

@pshipton
Copy link
Contributor

The doc at https://github.com/eclipse-openj9/openj9/blob/master/doc/diagnostics/AddingTracepoints.md
shows
2. If a tracepoint's signature changes, obsolete it and create a new one. By signature we mean the types, order and total number of format specifiers in the Template. Changes to the Tracepoint Type, Overhead, Level and NoEnv parameters, and cosmetic changes to the text in the Template can be made without adding a new tracepoint.

@keithc-ca
Copy link
Contributor

@EricYangIBM is correct: the name of the tracepoint is not relevant, just it's position within the *.tdf file that matters.
On the other hand, changing the name of a tracepoint will make understanding their history more difficult: I'm not sure it's warranted here. If we want to have the tracepoint names match the function name, either don't rename the function or add new tracepoints (and mark the existing ones as obsolete). There is a small (but IMO negligible) cost to having obsolete tracepoints, I don't think that cost justifies the potential for confusion down the road.

Copy link
Contributor

@babsingh babsingh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

first review pass ... please notify once the all the comments have been addressed ... then, will do a second review pass.

@babsingh
Copy link
Contributor

jenkins build all

@babsingh
Copy link
Contributor

Downstream (OpenJ9) testing should be good based upon the builds listed in #6432 (comment).

@EricYangIBM Can you locally verify and confirm if ibmruntimes/ci.docker#124 is fixed with the latest changes?

@EricYangIBM
Copy link
Contributor Author

EricYangIBM commented Apr 12, 2022

These changes seem to fix the docker issue (-XX:+OriginalJDK8HeapSizeCompatibilityMode is default, but heap size is the same with or without it):

root@285a1ceaa59d:~/hostdir/openj9-openjdk-jdk8# build/linux-x86_64-normal-server-release/images/j2sdk-image/bin/java -XshowSettings:vm -XX:+OriginalJDK8HeapSizeCompatibilityMode -version
VM settings:
    Max. Heap Size (Estimated): 3.00G
    Ergonomics Machine Class: server
    Using VM: Eclipse OpenJ9 VM

Copy link
Contributor

@babsingh babsingh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. @keithc-ca @0xdaryl, for final review and merge.

@EricYangIBM
Copy link
Contributor Author

Is this ready to merge?

@babsingh
Copy link
Contributor

Is this ready to merge?

Yes, lgtm. Waiting for @keithc-ca and @0xdaryl to approve.

Comment on lines +5742 to +5744
requiredSize = portLibrary->str_printf(portLibrary, NULL, 0, "/proc/%d/cgroup", pid);
Assert_PRT_true(requiredSize <= PATH_MAX);
portLibrary->str_printf(portLibrary, cgroupFilePath, sizeof(cgroupFilePath), "/proc/%d/cgroup", pid);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be improved (in a future pull request): there's no reason to call str_printf() twice if we are going to insist the result fit the the buffer we already have.
The assertion (that would go away) should use sizeof(cgroupFilePath) instead of PATH_MAX.

@tajila
Copy link
Contributor

tajila commented Apr 19, 2022

@0xdaryl Please review and merge these changes

@0xdaryl 0xdaryl merged commit 04d7395 into eclipse-omr:master Apr 19, 2022
@EricYangIBM EricYangIBM deleted the cgroup2 branch April 20, 2022 12:56
babsingh added a commit to babsingh/omr that referenced this pull request Jun 8, 2023
In eclipse-omr#6432, OMR port library started throwing an error if
isRunningInContainer failed. isRunningInContainer can fail if /proc
is mounted with the hidepid=2 setting on Linux (eclipse-omr#7021). This prevents
a JVM user to start. Before eclipse-omr#6432, no error was returned if
isRunningInContainer failed; a user was completely unaware of this
failure; this behaviour can lead to performance issues if the process
is running in a container; but no functional issues will be seen. The
new behaviour will not thrown an error if isRunningInContainer fails,
but will issue a warning message to highlight the potential performance
impact.

Currently, isRunningInContainer is run from omrsysinfo_startup.
Neither the trace engine nor NLS messages are enabled at this
point. If there is an error in isRunningInContainer, no tracepoint
or NLS message will work inside isRunningInContainer.

Invocation of isRunningInContainer is delayed to first-use. In OpenJ9,
the first-use still happens before the trace engine is initialized,
but it happens after the NLS messages are enabled. A new NLS message
has been added in eclipse-openj9/openj9#17560, which will show up as
a warning when isRunningInContainer fails and highlight the potential
performance impact.

The result of isRunningInContainer is cached and updated via an atomic
operation to enforce data consistency. The caching helps to improve
performance when isRunningInContainer is repeatedly invoked.

Four new states are introduced for PPG_isRunningInContainer to support
the new changes:
- OMRPORT_RUNNING_IN_CONTAINER_UNINITIALIZED: evaluate the result of
  isRunningInContainer.
- OMRPORT_RUNNING_IN_CONTAINER_TRUE: inside a container.
- OMRPORT_RUNNING_IN_CONTAINER_FALSE: not in a container.
- OMRPORT_RUNNING_IN_CONTAINER_ERROR: an error was encountered while
  evaluating isRunningInContainer.

Related: eclipse-omr#7021

Signed-off-by: Babneet Singh <sbabneet@ca.ibm.com>
babsingh added a commit to babsingh/omr that referenced this pull request Jun 9, 2023
In eclipse-omr#6432, OMR port library started throwing an error if
isRunningInContainer failed. isRunningInContainer can fail if /proc
is mounted with the hidepid=2 setting on Linux (eclipse-omr#7021). This prevents
a JVM user to start. Before eclipse-omr#6432, no error was returned if
isRunningInContainer failed; a user was completely unaware of this
failure; this behaviour can lead to performance issues if the process
is running in a container; but no functional issues will be seen. The
new behaviour will not throw an error if isRunningInContainer fails,
but will issue a warning message to highlight the potential performance
impact.

Currently, isRunningInContainer is run from omrsysinfo_startup.
Neither the trace engine nor NLS messages are enabled at this
point. If there is an error in isRunningInContainer, no tracepoint
or NLS message will work inside isRunningInContainer.

Invocation of isRunningInContainer is delayed to first-use. In OpenJ9,
the first-use still happens before the trace engine is initialized,
but it happens after the NLS messages are enabled. A new NLS message
has been added in eclipse-openj9/openj9#17560, which will show up as
a warning when isRunningInContainer fails and highlight the potential
performance impact.

The result of isRunningInContainer is cached and updated via an atomic
operation to enforce data consistency. The caching helps to improve
performance when isRunningInContainer is repeatedly invoked.

Four new states are introduced for PPG_isRunningInContainer to support
the new changes:
- OMRPORT_RUNNING_IN_CONTAINER_UNINITIALIZED: evaluate the result of
  isRunningInContainer.
- OMRPORT_RUNNING_IN_CONTAINER_TRUE: inside a container.
- OMRPORT_RUNNING_IN_CONTAINER_FALSE: not in a container.
- OMRPORT_RUNNING_IN_CONTAINER_ERROR: an error is encountered while
  evaluating isRunningInContainer.

Related: eclipse-omr#7021

Signed-off-by: Babneet Singh <sbabneet@ca.ibm.com>
babsingh added a commit to babsingh/omr that referenced this pull request Jun 9, 2023
In eclipse-omr#6432, OMR port library started throwing an error if
isRunningInContainer failed. isRunningInContainer can fail if /proc
is mounted with the hidepid=2 setting on Linux (eclipse-omr#7021). This prevents
a JVM user to start. Before eclipse-omr#6432, no error was returned if
isRunningInContainer failed; a user was completely unaware of this
failure; this behaviour can lead to performance issues if the process
is running in a container; but no functional issues will be seen. The
new behaviour will not throw an error if isRunningInContainer fails,
but will issue a warning message to highlight the potential performance
impact.

Currently, isRunningInContainer is run from omrsysinfo_startup.
Neither the trace engine nor NLS messages are enabled at this
point. If there is an error in isRunningInContainer, no tracepoint
or NLS message will work inside isRunningInContainer.

Invocation of isRunningInContainer is delayed to first-use. In OpenJ9,
the first-use still happens before the trace engine is initialized,
but it happens after the NLS messages are enabled. A new NLS message
has been added in eclipse-openj9/openj9#17560, which will show up as
a warning when isRunningInContainer fails and highlight the potential
performance impact.

The result of isRunningInContainer is cached and updated via an atomic
operation to enforce data consistency. The caching helps to improve
performance when isRunningInContainer is repeatedly invoked.

Four new states are introduced for PPG_isRunningInContainer to support
the new changes:
- OMRPORT_RUNNING_IN_CONTAINER_UNINITIALIZED: evaluate the result of
  isRunningInContainer.
- OMRPORT_RUNNING_IN_CONTAINER_TRUE: inside a container.
- OMRPORT_RUNNING_IN_CONTAINER_FALSE: not in a container.
- OMRPORT_RUNNING_IN_CONTAINER_ERROR: an error is encountered while
  evaluating isRunningInContainer.

Related: eclipse-omr#7021

Signed-off-by: Babneet Singh <sbabneet@ca.ibm.com>
babsingh added a commit to babsingh/omr that referenced this pull request Jun 9, 2023
In eclipse-omr#6432, OMR port library started throwing an error if
isRunningInContainer failed. isRunningInContainer can fail if /proc
is mounted with the hidepid=2 setting on Linux (eclipse-omr#7021). This prevents
a JVM user to start. Before eclipse-omr#6432, no error was returned if
isRunningInContainer failed; a user was completely unaware of this
failure; this behaviour can lead to performance issues if the process
is running in a container; but no functional issues will be seen. The
new behaviour will not throw an error if isRunningInContainer fails,
but will issue a warning message to highlight the potential performance
impact.

Currently, isRunningInContainer is run from omrsysinfo_startup.
Neither the trace engine nor NLS messages are enabled at this
point. If there is an error in isRunningInContainer, no tracepoint
or NLS message will work inside isRunningInContainer.

Invocation of isRunningInContainer is delayed to first-use. In OpenJ9,
the first-use still happens before the trace engine is initialized,
but it happens after the NLS messages are enabled. A new NLS message
has been added in eclipse-openj9/openj9#17560, which will show up as
a warning when isRunningInContainer fails and highlight the potential
performance impact.

The result of isRunningInContainer is cached and updated via an atomic
operation to enforce data consistency. The caching helps to improve
performance when isRunningInContainer is repeatedly invoked.

Four new states are introduced for PPG_isRunningInContainer to support
the new changes:
- OMRPORT_RUNNING_IN_CONTAINER_UNINITIALIZED: evaluate the result of
  isRunningInContainer.
- OMRPORT_RUNNING_IN_CONTAINER_TRUE: inside a container.
- OMRPORT_RUNNING_IN_CONTAINER_FALSE: not in a container.
- OMRPORT_RUNNING_IN_CONTAINER_ERROR: an error is encountered while
  evaluating isRunningInContainer.

Related: eclipse-omr#7021

Signed-off-by: Babneet Singh <sbabneet@ca.ibm.com>
babsingh added a commit to babsingh/omr that referenced this pull request Jun 13, 2023
In eclipse-omr#6432, OMR port library started throwing an error if
isRunningInContainer failed. isRunningInContainer can fail if /proc
is mounted with the hidepid=2 setting on Linux (eclipse-omr#7021). This prevents
a JVM user to start. Before eclipse-omr#6432, no error was returned if
isRunningInContainer failed; a user was completely unaware of this
failure; this behaviour can lead to performance issues if the process
is running in a container; but no functional issues will be seen. The
new behaviour will not throw an error if isRunningInContainer fails,
but will issue a warning message to highlight the potential performance
impact.

Currently, isRunningInContainer is run from omrsysinfo_startup.
Neither the trace engine nor NLS messages are enabled at this
point. If there is an error in isRunningInContainer, no tracepoint
or NLS message will work inside isRunningInContainer.

Invocation of isRunningInContainer is delayed to first-use. In OpenJ9,
the first-use still happens before the trace engine is initialized,
but it happens after the NLS messages are enabled. A new NLS message
has been added in eclipse-openj9/openj9#17560, which will show up as
a warning when isRunningInContainer fails and highlight the potential
performance impact.

The result of isRunningInContainer is cached and updated via an atomic
operation to enforce data consistency. The caching helps to improve
performance when isRunningInContainer is repeatedly invoked.

Four new states are introduced for PPG_isRunningInContainer to support
the new changes:
- OMRPORT_RUNNING_IN_CONTAINER_UNINITIALIZED: evaluate the result of
  isRunningInContainer.
- OMRPORT_RUNNING_IN_CONTAINER_TRUE: inside a container.
- OMRPORT_RUNNING_IN_CONTAINER_FALSE: not in a container.
- OMRPORT_RUNNING_IN_CONTAINER_ERROR: an error is encountered while
  evaluating isRunningInContainer.

Related: eclipse-omr#7021

Signed-off-by: Babneet Singh <sbabneet@ca.ibm.com>
babsingh added a commit to babsingh/omr that referenced this pull request Jun 13, 2023
In eclipse-omr#6432, OMR port library started throwing an error if
isRunningInContainer failed. isRunningInContainer can fail if /proc
is mounted with the hidepid=2 setting on Linux (eclipse-omr#7021). This prevents
a JVM user to start. Before eclipse-omr#6432, no error was returned if
isRunningInContainer failed; a user was completely unaware of this
failure; this behaviour can lead to performance issues if the process
is running in a container; but no functional issues will be seen. The
new behaviour will not throw an error if isRunningInContainer fails,
but will issue a warning message to highlight the potential performance
impact.

Currently, isRunningInContainer is run from omrsysinfo_startup.
Neither the trace engine nor NLS messages are enabled at this
point. If there is an error in isRunningInContainer, no tracepoint
or NLS message will work inside isRunningInContainer.

Invocation of isRunningInContainer is delayed to first-use. In OpenJ9,
the first-use still happens before the trace engine is initialized,
but it happens after the NLS messages are enabled. A new NLS message
has been added in eclipse-openj9/openj9#17560, which will show up as
a warning when isRunningInContainer fails and highlight the potential
performance impact.

The result of isRunningInContainer is cached and updated via an atomic
operation to enforce data consistency. The caching helps to improve
performance when isRunningInContainer is repeatedly invoked.

Four new states are introduced for PPG_isRunningInContainer to support
the new changes:
- OMRPORT_RUNNING_IN_CONTAINER_UNINITIALIZED: evaluate the result of
  isRunningInContainer.
- OMRPORT_RUNNING_IN_CONTAINER_TRUE: inside a container.
- OMRPORT_RUNNING_IN_CONTAINER_FALSE: not in a container.
- OMRPORT_RUNNING_IN_CONTAINER_ERROR: an error is encountered while
  evaluating isRunningInContainer.

Related: eclipse-omr#7021

Signed-off-by: Babneet Singh <sbabneet@ca.ibm.com>
babsingh added a commit to babsingh/omr that referenced this pull request Jun 13, 2023
In eclipse-omr#6432, OMR port library started throwing an error if
isRunningInContainer failed. isRunningInContainer can fail if /proc
is mounted with the hidepid=2 setting on Linux (eclipse-omr#7021). This prevents
a JVM user to start. Before eclipse-omr#6432, no error was returned if
isRunningInContainer failed; a user was completely unaware of this
failure; this behaviour can lead to performance issues if the process
is running in a container; but no functional issues will be seen. The
new behaviour will not throw an error if isRunningInContainer fails,
but will issue a warning message to highlight the potential performance
impact.

Currently, isRunningInContainer is run from omrsysinfo_startup.
Neither the trace engine nor NLS messages are enabled at this
point. If there is an error in isRunningInContainer, no tracepoint
or NLS message will work inside isRunningInContainer.

Invocation of isRunningInContainer is delayed to first-use. In OpenJ9,
the first-use still happens before the trace engine is initialized,
but it happens after the NLS messages are enabled. A new NLS message
has been added in eclipse-openj9/openj9#17560, which will show up as
a warning when isRunningInContainer fails and highlight the potential
performance impact.

The result of isRunningInContainer is cached and updated via an atomic
operation to enforce data consistency. The caching helps to improve
performance when isRunningInContainer is repeatedly invoked.

Four new states are introduced for PPG_isRunningInContainer to support
the new changes:
- OMRPORT_RUNNING_IN_CONTAINER_UNINITIALIZED: evaluate the result of
  isRunningInContainer.
- OMRPORT_RUNNING_IN_CONTAINER_TRUE: inside a container.
- OMRPORT_RUNNING_IN_CONTAINER_FALSE: not in a container.
- OMRPORT_RUNNING_IN_CONTAINER_ERROR: an error is encountered while
  evaluating isRunningInContainer.

Related: eclipse-omr#7021

Signed-off-by: Babneet Singh <sbabneet@ca.ibm.com>
babsingh added a commit to babsingh/omr that referenced this pull request Jun 13, 2023
In eclipse-omr#6432, OMR port library started throwing an error if
isRunningInContainer failed. isRunningInContainer can fail if /proc
is mounted with the hidepid=2 setting on Linux (eclipse-omr#7021). This prevents
a JVM user to start. Before eclipse-omr#6432, no error was returned if
isRunningInContainer failed; a user was completely unaware of this
failure; this behaviour can lead to performance issues if the process
is running in a container; but no functional issues will be seen. The
new behaviour will not throw an error if isRunningInContainer fails,
but will issue a warning message to highlight the potential performance
impact.

Currently, isRunningInContainer is run from omrsysinfo_startup.
Neither the trace engine nor NLS messages are enabled at this
point. If there is an error in isRunningInContainer, no tracepoint
or NLS message will work inside isRunningInContainer.

Invocation of isRunningInContainer is delayed to first-use. In OpenJ9,
the first-use still happens before the trace engine is initialized,
but it happens after the NLS messages are enabled. A new NLS message
has been added in eclipse-openj9/openj9#17560, which will show up as
a warning when isRunningInContainer fails and highlight the potential
performance impact.

The result of isRunningInContainer is cached and updated via an atomic
operation to enforce data consistency. The caching helps to improve
performance when isRunningInContainer is repeatedly invoked.

Four new states are introduced for PPG_isRunningInContainer to support
the new changes:
- OMRPORT_RUNNING_IN_CONTAINER_UNINITIALIZED: evaluate the result of
  isRunningInContainer.
- OMRPORT_RUNNING_IN_CONTAINER_TRUE: inside a container.
- OMRPORT_RUNNING_IN_CONTAINER_FALSE: not in a container.
- OMRPORT_RUNNING_IN_CONTAINER_ERROR: an error is encountered while
  evaluating isRunningInContainer.

Related: eclipse-omr#7021

Signed-off-by: Babneet Singh <sbabneet@ca.ibm.com>
RSalman pushed a commit to RSalman/omr that referenced this pull request Jun 22, 2023
In eclipse-omr#6432, OMR port library started throwing an error if
isRunningInContainer failed. isRunningInContainer can fail if /proc
is mounted with the hidepid=2 setting on Linux (eclipse-omr#7021). This prevents
a JVM user to start. Before eclipse-omr#6432, no error was returned if
isRunningInContainer failed; a user was completely unaware of this
failure; this behaviour can lead to performance issues if the process
is running in a container; but no functional issues will be seen. The
new behaviour will not throw an error if isRunningInContainer fails,
but will issue a warning message to highlight the potential performance
impact.

Currently, isRunningInContainer is run from omrsysinfo_startup.
Neither the trace engine nor NLS messages are enabled at this
point. If there is an error in isRunningInContainer, no tracepoint
or NLS message will work inside isRunningInContainer.

Invocation of isRunningInContainer is delayed to first-use. In OpenJ9,
the first-use still happens before the trace engine is initialized,
but it happens after the NLS messages are enabled. A new NLS message
has been added in eclipse-openj9/openj9#17560, which will show up as
a warning when isRunningInContainer fails and highlight the potential
performance impact.

The result of isRunningInContainer is cached and updated via an atomic
operation to enforce data consistency. The caching helps to improve
performance when isRunningInContainer is repeatedly invoked.

Four new states are introduced for PPG_isRunningInContainer to support
the new changes:
- OMRPORT_RUNNING_IN_CONTAINER_UNINITIALIZED: evaluate the result of
  isRunningInContainer.
- OMRPORT_RUNNING_IN_CONTAINER_TRUE: inside a container.
- OMRPORT_RUNNING_IN_CONTAINER_FALSE: not in a container.
- OMRPORT_RUNNING_IN_CONTAINER_ERROR: an error is encountered while
  evaluating isRunningInContainer.

Related: eclipse-omr#7021

Signed-off-by: Babneet Singh <sbabneet@ca.ibm.com>
rmnattas pushed a commit to rmnattas/omr that referenced this pull request Nov 7, 2023
In eclipse-omr#6432, OMR port library started throwing an error if
isRunningInContainer failed. isRunningInContainer can fail if /proc
is mounted with the hidepid=2 setting on Linux (eclipse-omr#7021). This prevents
a JVM user to start. Before eclipse-omr#6432, no error was returned if
isRunningInContainer failed; a user was completely unaware of this
failure; this behaviour can lead to performance issues if the process
is running in a container; but no functional issues will be seen. The
new behaviour will not throw an error if isRunningInContainer fails,
but will issue a warning message to highlight the potential performance
impact.

Currently, isRunningInContainer is run from omrsysinfo_startup.
Neither the trace engine nor NLS messages are enabled at this
point. If there is an error in isRunningInContainer, no tracepoint
or NLS message will work inside isRunningInContainer.

Invocation of isRunningInContainer is delayed to first-use. In OpenJ9,
the first-use still happens before the trace engine is initialized,
but it happens after the NLS messages are enabled. A new NLS message
has been added in eclipse-openj9/openj9#17560, which will show up as
a warning when isRunningInContainer fails and highlight the potential
performance impact.

The result of isRunningInContainer is cached and updated via an atomic
operation to enforce data consistency. The caching helps to improve
performance when isRunningInContainer is repeatedly invoked.

Four new states are introduced for PPG_isRunningInContainer to support
the new changes:
- OMRPORT_RUNNING_IN_CONTAINER_UNINITIALIZED: evaluate the result of
  isRunningInContainer.
- OMRPORT_RUNNING_IN_CONTAINER_TRUE: inside a container.
- OMRPORT_RUNNING_IN_CONTAINER_FALSE: not in a container.
- OMRPORT_RUNNING_IN_CONTAINER_ERROR: an error is encountered while
  evaluating isRunningInContainer.

Related: eclipse-omr#7021

Signed-off-by: Babneet Singh <sbabneet@ca.ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants