Profiling OCaml programs the quick and dirty way
CPU profiling
Just use perf
! Dune by default includes debug symbols to the compiled executable. Profiling it is as easy as perf record --call-graph dwarf program.exe
.
From there, use your favorite way to view perf report - FlameGraph is great. This discussion thread on ocaml.org
is good way to start.
If you are developing on MacOS where perf
is not available, you can try Instruments.app
(from Apple). First compile a native executable,
then choose it in the “Choose Target” dialog in the top-left corner, press the Record button to start profiling. The function names are mangled but still useful!
Memory profiling
My go-to now is janestreet/memtrace
. This blog article
among many
other good resources explain better.
NOTE: Below is the original article posted in Aug 2021. There are better ways to do profiling, listed above :)
Recently I followed the Ray Tracing in One Weekend tutorial to implement a path tracer in OCaml. I’ve always had a soft spot for ML languages, so I was pretty excited to finally try it out. However, rendering the final scene in the tutorial took very long, so I searched for profiling solutions in OCaml. As it turned out, profiling in OCaml wasn’t as easy as I thought.
What Worked: Landmarks
I’m using the ocaml-base-compiler
version 4.12.0
with dune
as my build system on MacOS.
The simplest way to profile that I’ve found is to use landmarks
.
Simply install landmarks with
opam install landmarks
Then, include the landmarks preprocessor in the dune
file:
(executable
(name test)
(libraries landmarks)
(preprocess (pps landmarks.ppx --auto))
)
If your project contains multiple dune
files (e.g. one for library
and one for executable
), you need to apply the changes to all of them.
Otherwise, the profiling result won’t include the symbols where the preprocessor
isn’t added.
To generate the profiling result, set the OCAML_LANDMARKS
environment
variable, compile and run the program.
OCAML_LANDMARKS=format=json,output=profile.json dune exec bin/main.exe
This writes the output to profile.json
. Then you can view the results
with the online viewer on github.io
. Note
that although the README suggests to install a landmarks-viewer
,
the landmarks-viewer
package doesn’t exist on OPAM (as of 4 Aug 2021). There’s
an open GitHub Issue tracking this: LexiFi/landmarks#17
.
Here’s the profiling output:
Name | Location | Calls | Cycles |
---|---|---|---|
ROOT | src/landmark.ml | 0 | 158 060 390 979 |
load(bin/ray_tracing) | bin/ray_tracing.ml:1 | 1 | 158 055 051 892 |
Bin/ray_tracing.ray_color | bin/ray_tracing.ml:16 | 82500 | 157 485 489 898 |
Lib/hittable.hit | lib/hittable.ml:6 | 219 328 | 155 697 662 560 |
The reason for calling it “quick and dirty” is that the profiling result only
shows how many times a user-defined function is called, and how many cycles
it took. I can’t find the allocation information, nor the time spent on
system/runtime functions. I tried running with sudo
(which worked for Rust’s cargo flamegraph
),
but it didn’t help.
ocamloptp
and ocamlprof
didn’t work
Many online resources (e.g. Chapter 17 Profiling (ocamlprof)
)
suggest using ocamloptp
in place of ocamlopt
to compile a binary
that outputs profiling information, then using ocamlprof
to read it.
This didn’t work for me because I’m using Dune instead of ocamlopt
and it’s not possible to configure Dune to use ocamloptp
as the compiler
(see ocaml/dune#398
).
I could compile the program without Dune, but I didn’t want to leave the comfort of a build system.
Spacetime didn’t work either
There’s an article written in 2017 by Jane Street introducing the Spacetime profiler . It looked promising at first, because we only need to configure to use the modified Spacetime compiler to profile the program.
But I quickly realized that the modified compiler version is too old: it’s version
4.04.0. It doesn’t even have the Float
package, so I decided it’s too much
work to make the program compile with the older compiler.