FPSpy Tool 
==========

Copyright (c) 2017 Peter A. Dinda  Please see LICENSE file.

This is a tool for floating point exception interception and
statistics gathering that can run underneath existing, unmodified
binaries.

FPSpy is documented in

P. Dinda, A. Bernat, C. Hetland, Spying on the Floating Point Behavior
of Existing, Unmodifed Scientific Applications, Proceedings of the
29th ACM Symposium on High-performance Parallel and Distributed
Computing (HPDC 2020), June, 2020.


You can also see the comment in fpe_preload.c for some details of how
this works and what it illustrates.


Building and Testing
--------------------

To build:

   make


To test:

   make test


Running
-------

The code has two modes of operation:

- Aggregate mode simply captures the floating point exception state
  at the beginning and end of the program.  Since the exception state
  is sticky, this will let us know if the program had 1 or more 
  occurances of each of the possible exceptions
- Individual mode captures individual floating point exceptions, 
  emulating the instructions that cause them.

The code can be run against a dynamically linked binary which crosses
the shared library boundary for the fe* library calls, which
manipulate the FPU behavior, and for the signal and sigaction system
calls.

To run against a binary:

LD_PRELOAD=fpe_preload.so [FPE_MODE=<mode>] [FPE_AGGRESSIVE=<yes|no>] exec.exe

The modes are "aggregate" and "individual" as noted above.   If no
mode is given, aggregate mode is assumed.   

Generally, fpe_preload gets out of the way if the executable itself
attempts to manipulate the FPU signaling state via the fe* and
signal/sigaction system calls.  By default, it is very sensitive to
this. If FPE_AGGRESSIVE is set, then it is less sensitive, which means
that more can be captured, but the execution is more likely to be
broken.

Additional environment variables

FPE_DISABLE_PTHREADS=yes    (or DISABLE_PTHREADS=yes)
   Do not trace newly created pthreads
   You will also want to set this for any application which
   does not dynamically link the pthread library.  Otherwise startup
   will fail when attempting to shim non-existent pthread functions.

FPE_MAXCOUNT=k
   means that only the first k exceptions will be recorded
   this only affects individual mode
   k=-1 means that there is no limit to how many exceptions
   will be recorded.  By default, k is about 64,000.

FPE_SAMPLE=k
   means that only every kth exception will be recorded
   this only affects individual mode

FPE_EXCEPT_LIST=list
   means that only the listed exceptions will be intercepted
   this only affects individual mode
   the comma-delimited list can include:
        invalid (NAN)
	denorm
	divide (divide by zero)
	overflow
	underflow
	precision (rounding)

FPE_POISSON=A:B
   means that Poisson sampling will be used with the ON period
   chosen from an exponential random distro with mean A usec
   and OFF period chosen from an exponential distro with mean
   B seconds.

Time-based sampling and poisson sampling model

FPE_SEED=n
   means the internal random number generator for sampling
   is seeded with value n

FPE_TIMER=real|virtual|prof  (default real)

virtual timer means by instructions, real timer means by real-time
That is, with FPE_POISSON=A:B, and FPE_TIMER=virtual, A and B are
interpretted as time spent awake, instead of time spent.   prof timer
is virtual time in both kernel and user space, and using a signal
the application is unlikely to be using.

For getting a sense of how FPE_POISSON operates, you can
also run:

make test_sleepy  (real time)

or

make test_dopey (virtual or profile time)

Forced changes in floating point execution environment

FPE_FORCE_ROUNDING=positive|negative|zero|nearest[;daz][;ftz]

This forces rounding to operate in the noted way (IEEE default is nearest).
If daz is included, this means all denorms are treated as zeros [Intel specific]
if ftz is included, this means all denorms are rounded to zeros [Intel specific]


Output and Analysis Scripts
---------------------------

FPSpy produces a trace for each thread.

In aggregate mode, a trace is short, simple, user-readable file which
is self-explanatory.

In individual mode, a trace is a binary format file which may be huge.
We provide tools to display and analyze such traces.

libtrace.h  /  libtrace.a -> trace access from C via memory mapping
                             (trace shows up as a giant array of structs)

trace_print.c             -> example use (just prints file in human-readable format)


parse_individual.pl       -> trace_print in perl

analyze_individual.pl     -> create report from trace

extrace_fp_event_timestamps.pl -> create time series from trace








