Discussion:
[gambit-list] full program optimization
Dimitris Vyzovitis
2018-03-03 09:50:27 UTC
Permalink
Hi Marc,

Does gsc perform any full program optimization and cross module
inlining when compiling executables?

I am particularly interested in the case of linking multiple scheme
modules together to compile executables. The Gerbil compiler uses this
mode for compiling static executables, which was done with full
program optimization in mind. At a very basic level, it allows global
prelude declarations like `(not safe)` to be applied to the whole
program, but it could do much more.

It doesn't have to be too complicated:
- the partial evaluator should look across module boundaries
- small procedures should be inlined across module boundaries
- (block) declarations should be taken into account across modules,
so that mutability information can be propagated. Tthis would
allow the compiler to elide procedure call checks for procedures
defined in another module when the binding is not mutated.

-- vyzo
Marc Feeley
2018-03-04 02:14:20 UTC
Permalink
Currently in Gambit there are two ways to achieve whole program optimization.

The first is to conceptually concatenate all modules by using a file that contains “include”s of all the modules. This has the advantage of exposing all definitions to the inliner, and to allow the tree-shaker to eliminate the useless code. This obviously gives a relatively slow compilation, but the best code specialization because the compiler “sees the whole code base”.

The second method is to use the “core” declaration (which is undocumented). When a toplevel definition is in the scope of a (not core) declaration, it will not generate any code per-se, but the definition is remembered so that the definition can be inlined in the rest of the program. For example:


% cat mod.scm
;; File: mod.scm

(define (twice x)
(declare (standard-bindings) (fixnum) (not safe))
(+ x x))

(define (cube y)
(declare (standard-bindings) (fixnum) (not safe))
(* y y y))

% cat prog.scm
;; File: prog.scm

(declare (block) (not core))
(include "mod.scm")
(declare (core))

(println (cube (twice (read))))

% gsc -c -expansion prog.scm
Expansion:

(println (let ((y (let ((x (read)))
('#<procedure #2 ##fx+> x x))))
('#<procedure #3 ##fx*> y y y)))


Marc
Dimitris Vyzovitis
2018-03-04 08:54:22 UTC
Permalink
Post by Marc Feeley
Currently in Gambit there are two ways to achieve whole program optimization.
The first is to conceptually concatenate all modules by using a file that
contains “include”s of all the modules. This has the advantage of exposing
all definitions to the inliner, and to allow the tree-shaker to eliminate
the useless code. This obviously gives a relatively slow compilation, but
the best code specialization because the compiler “sees the whole code
base”.
There are a few of problems with the include approach:
- the main problem is C compilation performance or lack thereof. gcc just
takes forever and usually runs out of memory if you have more than a
handful of (non trivial) modules linked in. Compiling an 11M/230kloc
single-host function is not a simple matter.
- the second problem is leakage of module declarations; if for example a
module declares (fixnum), that declaration will leak to the rest of the
program after the include. I guess this could be avoided by carefully
resetting the declarations after each include, so it's not insurmountable.
- the third problem is leakage of ffi symbols -- you suddenly have all the
ffi code linked in the same C module, which means that you lose the ability
to locally name things. More of an annoyance, but I didn't like having to
rename all the `ffi_free` functions in the various ffi modules.

The first problem is truly insurmountable. Linking more than a handful of
modules results in gcc blowing up...
I haven't managed to compile any non-trivial program that uses a decent
part of the gerbil stdlib, they all end with the compiler getting OOM
killed.
Post by Marc Feeley
The second method is to use the “core” declaration (which is
undocumented). When a toplevel definition is in the scope of a (not core)
declaration, it will not generate any code per-se, but the definition is
remembered so that the definition can be inlined in the rest of the
thanks, I didn't know about it; this sounds useful to force inlining of a
module in conjunction with the include method.
How does it behave with complex recursive functions or functions that are
used more than once? Will it emit code for the function then?

-- vyzo
Marc Feeley
2018-03-04 11:19:21 UTC
Permalink
The first problem is truly insurmountable. Linking more than a handful of modules results in gcc blowing up...
I haven't managed to compile any non-trivial program that uses a decent part of the gerbil stdlib, they all end with the compiler getting OOM killed.
I believe this problem would vanish if the tree-shaker is enabled and the runtime library is refactored to avoid gratuitous dependencies. Then your final executable would only contain the parts of the runtime library (and the program) that are useful for this specific program.

This strategy is what I envision for the universal backend where the bloat is more severe than in C.
thanks, I didn't know about it; this sounds useful to force inlining of a module in conjunction with the include method.
How does it behave with complex recursive functions or functions that are used more than once? Will it emit code for the function then?
No. You still have to separately compile and link the modules. These modules will be compiled to normal function definitions. The modules included in the scope of a (not core) declaration produce no code, but the definitions are available to the inliner.

I think a new form to limit the scope of declarations would be useful for all of this. For example:

(declare (core))

(begin-with-declaration-scope
(declare (not core))
(include "mod.scm"))

;; at this point we are back in the scope of the (declare (core))

Maybe begin-with-declaration-scope could have a better name…

Marc
Dimitris Vyzovitis
2018-03-04 12:21:48 UTC
Permalink
Post by Dimitris Vyzovitis
The first problem is truly insurmountable. Linking more than a handful
of modules results in gcc blowing up...
Post by Dimitris Vyzovitis
I haven't managed to compile any non-trivial program that uses a decent
part of the gerbil stdlib, they all end with the compiler getting OOM
killed.
I believe this problem would vanish if the tree-shaker is enabled and the
runtime library is refactored to avoid gratuitous dependencies. Then your
final executable would only contain the parts of the runtime library (and
the program) that are useful for this specific program.
This strategy is what I envision for the universal backend where the bloat
is more severe than in C.
The tree-shaker makes a difference indeed. I did manage to compile a
program that was OOMing with the tree shaker enabled with (declare
(optimize-dead-definitions)).
Unfortunately the resulting program crashes with Bus Error when poked (it
starts up fine, but it crashes once it starts communicating) and it's quite
hard to tell what went wrong; program works fine with separate linkage
obviously.
I think a new form to limit the scope of declarations would be useful for
(declare (core))
(begin-with-declaration-scope
(declare (not core))
(include "mod.scm"))
;; at this point we are back in the scope of the (declare (core))
Maybe begin-with-declaration-scope could have a better name

Such a form would make life much simpler indeed, no need to reset
declarations after including.

-- vyzo
Dimitris Vyzovitis
2018-03-04 17:10:14 UTC
Permalink
Post by Dimitris Vyzovitis
Post by Dimitris Vyzovitis
The first problem is truly insurmountable. Linking more than a handful
of modules results in gcc blowing up...
Post by Dimitris Vyzovitis
I haven't managed to compile any non-trivial program that uses a decent
part of the gerbil stdlib, they all end with the compiler getting OOM
killed.
I believe this problem would vanish if the tree-shaker is enabled and the
runtime library is refactored to avoid gratuitous dependencies. Then your
final executable would only contain the parts of the runtime library (and
the program) that are useful for this specific program.
This strategy is what I envision for the universal backend where the
bloat is more severe than in C.
The tree-shaker makes a difference indeed. I did manage to compile a
program that was OOMing with the tree shaker enabled with (declare
(optimize-dead-definitions)).
Unfortunately the resulting program crashes with Bus Error when poked (it
starts up fine, but it crashes once it starts communicating) and it's quite
hard to tell what went wrong; program works fine with separate linkage
obviously.
So the problem was indeed declaration leakage -- there was a (declare
(fixnum)) leaking to the rest of the program and crashing inside _num.scm,
just as I predicted :)
I have fixed this by having the compiler issue a reset declaration after
each include, followed by the user's prelude declarations if any.
I am also happy to report that the program runs a little faster (~3.2s from
~3.6s), so there is indeed benefit from FPO.

-- vyzo
Bradley Lucier
2018-03-05 01:55:15 UTC
Permalink
I’d be curious to know where in _num.scm that problem occurred.

Brad
So the problem was indeed declaration leakage -- there was a (declare (fixnum)) leaking to the rest of the program and crashing inside _num.scm,
Dimitris Vyzovitis
2018-03-05 07:02:24 UTC
Permalink
I didn't pinpoint it precisely, but it's clearly not _num.scm's fault if
you are accidentally doing unsafe fx ops with arbitrary numbers :)

-- vyzo
Post by Bradley Lucier
I’d be curious to know where in _num.scm that problem occurred.
Brad
Post by Dimitris Vyzovitis
So the problem was indeed declaration leakage -- there was a (declare
(fixnum)) leaking to the rest of the program and crashing inside _num.scm,
Bradley Lucier
2018-03-05 01:56:21 UTC
Permalink
I’d be curious to know where in _num.scm that problem occurred.

Brad
So the problem was indeed declaration leakage -- there was a (declare (fixnum)) leaking to the rest of the program and crashing inside _num.scm,
Loading...