17Jan2022

Profiling ocaml programs

The ocamlprof command produces a source listing of the program modules where execution counts have been inserted as comments. For instance,. Naturally, this information is accurate only if the source file has not been modified after it was compiled. Profiling with ocamlprof only records execution counts, not the actual time spent within each function.

There is currently no way to perform time profiling on bytecode programs generated by ocamlc. Native-code programs generated by ocamlopt can be profiled for time and execution counts using the -p option and the standard Unix profiler gprof. Just add the -p option when compiling and linking the program:.

OCaml function names in the output of gprof have the following format:. Other functions shown are either parts of the OCaml run-time system or external C functions linked with the program. If the libunwind library is not available on the system then it will not be possible for Spacetime to profile allocations occurring within C stubs. If the libunwind library is available but in an unusual location then that location may be specified to the configure script using the -libunwinddir option or alternatively, using separate -libunwindinclude and -libunwindlib options.

Once the appropriate compiler has been selected the program should be built as normal ensuring that all files are built with the Spacetime compiler—there is currently no protection to ensure this is the case, but it is essential. For many uses it will not be necessary to change the code of the program to use the profiler. Spacetime-configured compilers run slower and occupy more memory than their counterparts.

It is hoped this will be fixed in the future as part of improved cross compilation support. Programs built with Spacetime instrumentation have a dependency on the libunwind library unless that was unavailable at configure time or the -disable-libunwind option was specified see section The contents of the OCaml heap will be sampled each time the number of milliseconds that the program has spent executing since the last sample exceeds the given number. Note that the time base is combined user plus system time— not wall clock time.

Is OCaml going to be smart enough to inline the max function and specialise it to work on integers? Disappointingly the answer is no. OCaml still has to generate the external Max. Disappointingly although the definition of max in this code is local it can't be called from outside the module , OCaml still doesn't specialise the function.

Lesson: if you have a function which is unintentionally polymorphic then you can help the compiler by specifying types for one or more of the arguments. There are a number of peculiarities about integers in OCaml. One of these is that integers are 31 bit entities, not 32 bit entities. What happens to the "missing" bit?

The important code is shown in red. This is fast. But secondly we see that the number being passed is 7, not 3. This is a consequence of the representation of integers in OCaml.

The bottom bit of the integer is used as a tag - we'll see what for next. The top 31 bits are the actual integer. To get from the OCaml representation to the integer, divide by two and round down.

Why the tag bit at all? This bit is used to distinguish between integers and pointers to structures on the heap, and the distinction is only necessary if we are calling a polymorphic function. Nevertheless, to avoid having two internal representations for integers, all integers in OCaml carry around the tag bit. A bit of background about pointers is required to understand why the tag bit is really necessary, and why it is where it is. So on the older 32 bit Sparc, for example, it's not possible to create and use a pointer which isn't aligned to a multiple of 4 bytes.

Trying to use one generates a processor exception, which means basically your program segfaults. The reason for this is just to simplify memory access. It's just a lot simpler to design the memory subsystem of a CPU if you only need to worry about word-aligned access.

For historical reasons because the x86 is derived from an 8 bit chip , the x86 has supported unaligned memory access, although if you align all memory accesses to multiples of 4, then things go faster. Nevertheless, all pointers in OCaml are aligned - ie. This means that the bottom bit of any pointer in OCaml will always be zero. So you can see that by looking at the bottom bit of a register, you can immediately tell if it stores a pointer "tag" bit is zero , or an integer tag bit set to one.

But greaterthan can be called with integers, floats, strings, opaque objects The compiler should enforce this at compile time. I would assume that greaterthan probably contains code to sanity-check this at run time however.

Floats are, by default, boxed allocated on the heap. Save this as float. Instead, it is created statically in the data segment:. Note the structure of the floating point number: it has a header , followed by the 8 byte 2 word representation of the number itself. The header can be decoded by writing it as binary:. I mentioned earlier that one of OCaml's targets was numerical computing.

Numerical computing does a lot of work on vectors and matrices, which are essentially arrays of floats. As a special hack to make this go faster, OCaml implements arrays of unboxed floats.

This means that in the special case where we have an object of type float array array of floats , OCaml stores them the same way as in C:. I'm going to compile this code with the -unsafe option to remove bounds checking simplifying the code for our exposition here. The first line, which creates the array, is compiled to a simple C call:.

If you recall the syntax, [ Compiling this code reveals some interesting new features. Firstly the code which allocates the array:. We update the pointer to point at the first data word, ie. The header word is , which if you write it in binary means a block containing 5 words, with tag zero.

The tag of zero means it's a "structured block" a. The same problem exists with ocamloptp. The amount of profiling information can be controlled through the -P option to ocamlcp or ocamloptp , followed by one or several letters indicating which parts of the program should be profiled:.

For instance, compiling with ocamlcp -P film profiles function calls, if…then…else…, loops and pattern matching. Calling ocamlcp or ocamloptp without the -P option defaults to -P fm , meaning that only function calls and pattern matching are profiled.

For compatibility with previous releases, ocamlcp also accepts the -p option, with the same arguments and behaviour as -P. The ocamlcp and ocamloptp commands also accept all the options of the corresponding ocamlc or ocamlopt compiler, except the -pp preprocessing option.

sermidulrang1989's Ownd

0コメント

1000 / 1000