Tianyi Song

Profiling OCaml programs the quick and dirty way

CPU profiling

Just use perf! Dune by default includes debug symbols to the compiled executable. Profiling it is as easy as perf record --call-graph dwarf program.exe. From there, use your favorite way to view perf report - FlameGraph is great. This discussion thread on ocaml.org is good way to start.

If you are developing on MacOS where perf is not available, you can try Instruments.app (from Apple). First compile a native executable, then choose it in the “Choose Target” dialog in the top-left corner, press the Record button to start profiling. The function names are mangled but still useful!

A screenshot of Instruments.app

Memory profiling

My go-to now is janestreet/memtrace. This blog article among many other good resources explain better.


NOTE: Below is the original article posted in Aug 2021. There are better ways to do profiling, listed above :)

Recently I followed the Ray Tracing in One Weekend tutorial to implement a path tracer in OCaml. I’ve always had a soft spot for ML languages, so I was pretty excited to finally try it out. However, rendering the final scene in the tutorial took very long, so I searched for profiling solutions in OCaml. As it turned out, profiling in OCaml wasn’t as easy as I thought.

What Worked: Landmarks

I’m using the ocaml-base-compiler version 4.12.0 with dune as my build system on MacOS. The simplest way to profile that I’ve found is to use landmarks .

Simply install landmarks with

opam install landmarks

Then, include the landmarks preprocessor in the dune file:

(executable
 (name test)
 (libraries landmarks)
 (preprocess (pps landmarks.ppx --auto))
)

If your project contains multiple dune files (e.g. one for library and one for executable), you need to apply the changes to all of them. Otherwise, the profiling result won’t include the symbols where the preprocessor isn’t added.

To generate the profiling result, set the OCAML_LANDMARKS environment variable, compile and run the program.

OCAML_LANDMARKS=format=json,output=profile.json dune exec bin/main.exe

This writes the output to profile.json. Then you can view the results with the online viewer on github.io . Note that although the README suggests to install a landmarks-viewer, the landmarks-viewer package doesn’t exist on OPAM (as of 4 Aug 2021). There’s an open GitHub Issue tracking this: LexiFi/landmarks#17 .

Here’s the profiling output:

Name Location Calls Cycles
ROOT src/landmark.ml 0 158 060 390 979
load(bin/ray_tracing) bin/ray_tracing.ml:1 1 158 055 051 892
Bin/ray_tracing.ray_color bin/ray_tracing.ml:16 82500 157 485 489 898
Lib/hittable.hit lib/hittable.ml:6 219 328 155 697 662 560

The reason for calling it “quick and dirty” is that the profiling result only shows how many times a user-defined function is called, and how many cycles it took. I can’t find the allocation information, nor the time spent on system/runtime functions. I tried running with sudo (which worked for Rust’s cargo flamegraph), but it didn’t help.

ocamloptp and ocamlprof didn’t work

Many online resources (e.g. Chapter 17 Profiling (ocamlprof) ) suggest using ocamloptp in place of ocamlopt to compile a binary that outputs profiling information, then using ocamlprof to read it.

This didn’t work for me because I’m using Dune instead of ocamlopt and it’s not possible to configure Dune to use ocamloptp as the compiler (see ocaml/dune#398 ).

I could compile the program without Dune, but I didn’t want to leave the comfort of a build system.

Spacetime didn’t work either

There’s an article written in 2017 by Jane Street introducing the Spacetime profiler . It looked promising at first, because we only need to configure to use the modified Spacetime compiler to profile the program.

But I quickly realized that the modified compiler version is too old: it’s version 4.04.0. It doesn’t even have the Float package, so I decided it’s too much work to make the program compile with the older compiler.