Debugging
This chapter provides an information on how to exam running dCache system, identify problems and provide debug information to developers.
Java Flight recorder
When debugging an issue on a running system often we need to collect jvm performance stats with Java flight recorder
. Starting from release 7.2 the Java flight recorder attach listener is enabled by default. Site admins can collect and provide developers with additional information when high CPU load, memory consumption or file descriptor leaks are observed. To enable the flight recorder
jcmd command is used, which is typically provided as a part of java-11-openjdk-devel
(on RHEL and clones).
To control recoding the following subcommands of jcmd available:
- JFR.start
- Start a recording.
-
JFR.stop
- Stop a recording with a specific identification number.
-
JFR.dump
- Dump the data collected so far by the recording with a specific identification number.
Example:
jcmd <pid> JFR.start filename=/tmp/dcache.jfr
The java process pid can be obtained with systemctl or jps command. There are several to limit the recording:
- time duration based
- Limiting by time window
- Limiting by dump file size
Typically, a duration or time window based recordings should be used.
Duration based recording
Useful to record for a fixed period of time. The duration can be specified in in (s)econds, (m)inutes, (h)ours, or (d)ays:
jcmd <pid> JFR.start duration=60s filename=/tmp/dcache.jfr
The recording file will be written into defined file after specified time duration. The flight recorder will be automatically switched off.
Time window based limit
The recording is collected in a ring-buffer like state with a fixed time-based limit. Such recording is useful in situations when we can’t predict the point in time of the interested event.
jcmd <pid> JFR.start maxage=10m name=my-10m-records
The name=my-10m-records
option allows to give a human readable identifier to the recording.
The recording can be collected as:
jcmd <pid> JFR.dump name=my-10m-records filename=/tmp/dcache.jfr
The flight recorder will stay active and new recording can be collected, if needed.
To stop the recording the following command can be used:
jcmd <pid> JFR.stop name=my-10m-records
Size based limit
The recording can be started with an explicit maximum recorded data size. The size can be specified in in (k)B, (M)B or (G)B.
jcmd <pid> JFR.start maxsize=100K name=my-100k-records
Inspecting flight recordings
Typically, the collected data should be provided to dCache development team. However, curious sysadmins can inspect the recordings with Oracle VisualVM or Oracle JMC
Perf-Tools
Linux perf
tools is often used by admins to inspect CPU performance counters, kprobes, tracepoints and uprobes, and included into many distros by default. The perf tools expect symbols for code executed from unknown memory regions at /tmp/perf-
jcmd <pid> Compiler.perfmap
After the mapping file is generated perf tools are able to correctly display Java classes and method names, for example, the output of perf top -p 527973
:
Samples: 18K of event 'cycles:P', 4000 Hz, Event count (approx.): 6797468610 lost: 0/0 drop: 0/0
Overhead Shared Object Symbol
4.64% [JIT] tid 527973 [.] vtable chunks
2.46% libjvm.so [.] _ZN8ZBarrier29mark_barrier_on_oop_slow_pathEm
2.46% libjvm.so [.] _ZN5ZMark15mark_and_followEP10ZMarkCache15ZMarkStackEntry
2.09% [JIT] tid 527973 [.] boolean javax.security.auth.Subject$SecureSet.contains(java.lang.Object)
1.58% [JIT] tid 527973 [.] java.security.ProtectionDomain[] javax.security.auth.SubjectDomainCombiner.combine(java.security.ProtectionDomain[], java.security.ProtectionDomain[])
1.54% [JIT] tid 527973 [.] ch.qos.logback.core.spi.FilterReply ch.qos.logback.classic.spi.TurboFilterList.getTurboFilterChainDecision(org.slf4j.Marker, ch.qos.logback.classic.Logge
1.45% [JIT] tid 527973 [.] boolean com.sun.security.auth.UnixNumericGroupPrincipal.equals(java.lang.Object)
1.30% libjvm.so [.] JVM_GetStackAccessControlContext
1.08% [JIT] tid 527973 [.] StubRoutines (2)
1.07% [JIT] tid 527973 [.] vtable chunks
1.05% [JIT] tid 527973 [.] java.lang.Object java.util.concurrent.ConcurrentHashMap.computeIfAbsent(java.lang.Object, java.util.function.Function)
0.93% libjvm.so [.] _ZNK9ZRelocate14forward_objectEP11ZForwardingm
0.89% libjvm.so [.] _ZN21OopOopIterateDispatchI22ZMarkBarrierOopClosureILb0EEE5Table15oop_oop_iterateI13InstanceKlassP7oopDescEEvPS1_S7_P5Klass
0.77% [JIT] tid 527973 [.] void org.postgresql.core.v3.QueryExecutorImpl.processResults(org.postgresql.core.ResultHandler, int, boolean)
0.74% [kernel] [k] entry_SYSCALL_64
More info on usage perf usage at perf Examples by Brendan Gregg.