Discussion:
[gambit-list] The monster that killed gcc
Dimitris Vyzovitis
2018-03-19 17:29:39 UTC
Permalink
The attached file results in a 140kloc monster that results in gcc dying
with OOM after several minutes of effort, and I would like to understand
why.
Any ideas?

-- vyzo
Dimitris Vyzovitis
2018-03-19 18:23:53 UTC
Permalink
It seems it's the inliner going haywire -- if I add a (declare
(inlining-limit 100)), then it compiles in 20s.

-- vyzo
Post by Dimitris Vyzovitis
The attached file results in a 140kloc monster that results in gcc dying
with OOM after several minutes of effort, and I would like to understand
why.
Any ideas?
-- vyzo
Marc Feeley
2018-03-19 18:27:19 UTC
Permalink
Out of curiosity, what is the number of LOC of C with and without the inlining-limit?

I’m just wondering if this should be classified as an issue, or if the inliner is just doing its work as expected.

Marc
It seems it's the inliner going haywire -- if I add a (declare (inlining-limit 100)), then it compiles in 20s.
-- vyzo
The attached file results in a 140kloc monster that results in gcc dying with OOM after several minutes of effort, and I would like to understand why.
Any ideas?
-- vyzo
Dimitris Vyzovitis
2018-03-19 18:31:09 UTC
Permalink
It's 140kloc without the inlining declaration and just 22Kloc with the
declaration.

-- vyzo
Post by Marc Feeley
Out of curiosity, what is the number of LOC of C with and without the inlining-limit?
I’m just wondering if this should be classified as an issue, or if the
inliner is just doing its work as expected.
Marc
Post by Dimitris Vyzovitis
It seems it's the inliner going haywire -- if I add a (declare
(inlining-limit 100)), then it compiles in 20s.
Post by Dimitris Vyzovitis
-- vyzo
The attached file results in a 140kloc monster that results in gcc dying
with OOM after several minutes of effort, and I would like to understand
why.
Post by Dimitris Vyzovitis
Any ideas?
-- vyzo
Marc Feeley
2018-03-19 18:41:40 UTC
Permalink
The default inlining-limit is 350, so the expansion from 100 is quite possible.

But the problem here is that gcc chokes on the compilation of the C file. So… what are the compilation options passed to gcc?

- are you using --enable-single-host ?
- are you using a higher level of optimization such as -O2 or -O3 rather than the default -O1 ?

These will definitely increase the pressure on the C compiler. Also, some versions of gcc do a better job at compiling large C files. The file lib/_io.c in the Gambit distribution is about 90kloc and I have never gotten an OOM error from gcc while compiling it, even though I use a “make -j 8” (8 C compilations in parallel). I do have 16 GB of RAM on my machine… how much RAM do you have on yours?

Marc
It's 140kloc without the inlining declaration and just 22Kloc with the declaration.
-- vyzo
Out of curiosity, what is the number of LOC of C with and without the inlining-limit?
I’m just wondering if this should be classified as an issue, or if the inliner is just doing its work as expected.
Marc
It seems it's the inliner going haywire -- if I add a (declare (inlining-limit 100)), then it compiles in 20s.
-- vyzo
The attached file results in a 140kloc monster that results in gcc dying with OOM after several minutes of effort, and I would like to understand why.
Any ideas?
-- vyzo
Dimitris Vyzovitis
2018-03-19 18:56:27 UTC
Permalink
My gambit is configured with --enable-single-host --enable-c-opt
--enable-gcc-opts.
I have 8G on my current laptop, but I run without a swap; the death occurs
at around 6G.
It's not only the memory usage though, it takes forever too. clang on
travis didn't OOM, but it took 15min on the file.

I think it might be a case of really bad interaction between the various
optimizers in the 3 compilers involved. The gerbil emitted code is already
heavily optimized to perform match tree linearization (I have a shiny new
optimizer that optimizes match and syntax- case expansions).
That means you can't reasonably inline anything other than single use
procedures within the optimized blocks.

-- vyzo
Post by Marc Feeley
The default inlining-limit is 350, so the expansion from 100 is quite possible.
But the problem here is that gcc chokes on the compilation of the C file.
So
 what are the compilation options passed to gcc?
- are you using --enable-single-host ?
- are you using a higher level of optimization such as -O2 or -O3 rather
than the default -O1 ?
These will definitely increase the pressure on the C compiler. Also, some
versions of gcc do a better job at compiling large C files. The file
lib/_io.c in the Gambit distribution is about 90kloc and I have never
gotten an OOM error from gcc while compiling it, even though I use a “make
-j 8” (8 C compilations in parallel). I do have 16 GB of RAM on my
machine
 how much RAM do you have on yours?
Marc
Post by Dimitris Vyzovitis
It's 140kloc without the inlining declaration and just 22Kloc with the
declaration.
Post by Dimitris Vyzovitis
-- vyzo
Out of curiosity, what is the number of LOC of C with and without the
inlining-limit?
Post by Dimitris Vyzovitis
I’m just wondering if this should be classified as an issue, or if the
inliner is just doing its work as expected.
Post by Dimitris Vyzovitis
Marc
Post by Dimitris Vyzovitis
It seems it's the inliner going haywire -- if I add a (declare
(inlining-limit 100)), then it compiles in 20s.
Post by Dimitris Vyzovitis
Post by Dimitris Vyzovitis
-- vyzo
The attached file results in a 140kloc monster that results in gcc
dying with OOM after several minutes of effort, and I would like to
understand why.
Post by Dimitris Vyzovitis
Post by Dimitris Vyzovitis
Any ideas?
-- vyzo
Marc Feeley
2018-03-19 19:02:49 UTC
Permalink
I think you should figure out which of the configure options

--enable-single-host
--enable-c-opt
--enable-gcc-opts

are the most useful/beneficial for the kind of code generated by Gerbil. I suspect that --enable-single-host is the most performance-enhancing option, and --enable-c-opt only gives a marginal speed increase at the cost of a much higher C compile time.

If you do try the various combinations, please report your results here. I’d like to know if my intuition is correct.

Marc
My gambit is configured with --enable-single-host --enable-c-opt --enable-gcc-opts.
I have 8G on my current laptop, but I run without a swap; the death occurs at around 6G.
It's not only the memory usage though, it takes forever too. clang on travis didn't OOM, but it took 15min on the file.
I think it might be a case of really bad interaction between the various optimizers in the 3 compilers involved. The gerbil emitted code is already heavily optimized to perform match tree linearization (I have a shiny new optimizer that optimizes match and syntax- case expansions).
That means you can't reasonably inline anything other than single use procedures within the optimized blocks.
-- vyzo
The default inlining-limit is 350, so the expansion from 100 is quite possible.
But the problem here is that gcc chokes on the compilation of the C file. So… what are the compilation options passed to gcc?
- are you using --enable-single-host ?
- are you using a higher level of optimization such as -O2 or -O3 rather than the default -O1 ?
These will definitely increase the pressure on the C compiler. Also, some versions of gcc do a better job at compiling large C files. The file lib/_io.c in the Gambit distribution is about 90kloc and I have never gotten an OOM error from gcc while compiling it, even though I use a “make -j 8” (8 C compilations in parallel). I do have 16 GB of RAM on my machine… how much RAM do you have on yours?
Marc
It's 140kloc without the inlining declaration and just 22Kloc with the declaration.
-- vyzo
Out of curiosity, what is the number of LOC of C with and without the inlining-limit?
I’m just wondering if this should be classified as an issue, or if the inliner is just doing its work as expected.
Marc
It seems it's the inliner going haywire -- if I add a (declare (inlining-limit 100)), then it compiles in 20s.
-- vyzo
The attached file results in a 140kloc monster that results in gcc dying with OOM after several minutes of effort, and I would like to understand why.
Any ideas?
-- vyzo
Dimitris Vyzovitis
2018-03-19 19:05:06 UTC
Permalink
Sure, I'd like to get to the bottom of this because it's such an ugly
failure mode!

For now, I changed the gerbil compiler to emit an inlining-limit
declaration in meta phases (that's where the syntax-case monsters reside).

-- vyzo
Post by Marc Feeley
I think you should figure out which of the configure options
--enable-single-host
--enable-c-opt
--enable-gcc-opts
are the most useful/beneficial for the kind of code generated by Gerbil.
I suspect that --enable-single-host is the most performance-enhancing
option, and --enable-c-opt only gives a marginal speed increase at the cost
of a much higher C compile time.
If you do try the various combinations, please report your results here.
I’d like to know if my intuition is correct.
Marc
Post by Dimitris Vyzovitis
My gambit is configured with --enable-single-host --enable-c-opt
--enable-gcc-opts.
Post by Dimitris Vyzovitis
I have 8G on my current laptop, but I run without a swap; the death
occurs at around 6G.
Post by Dimitris Vyzovitis
It's not only the memory usage though, it takes forever too. clang on
travis didn't OOM, but it took 15min on the file.
Post by Dimitris Vyzovitis
I think it might be a case of really bad interaction between the various
optimizers in the 3 compilers involved. The gerbil emitted code is already
heavily optimized to perform match tree linearization (I have a shiny new
optimizer that optimizes match and syntax- case expansions).
Post by Dimitris Vyzovitis
That means you can't reasonably inline anything other than single use
procedures within the optimized blocks.
Post by Dimitris Vyzovitis
-- vyzo
The default inlining-limit is 350, so the expansion from 100 is quite
possible.
Post by Dimitris Vyzovitis
But the problem here is that gcc chokes on the compilation of the C
file. So
 what are the compilation options passed to gcc?
Post by Dimitris Vyzovitis
- are you using --enable-single-host ?
- are you using a higher level of optimization such as -O2 or -O3 rather
than the default -O1 ?
Post by Dimitris Vyzovitis
These will definitely increase the pressure on the C compiler. Also,
some versions of gcc do a better job at compiling large C files. The file
lib/_io.c in the Gambit distribution is about 90kloc and I have never
gotten an OOM error from gcc while compiling it, even though I use a “make
-j 8” (8 C compilations in parallel). I do have 16 GB of RAM on my
machine
 how much RAM do you have on yours?
Post by Dimitris Vyzovitis
Marc
Post by Dimitris Vyzovitis
It's 140kloc without the inlining declaration and just 22Kloc with the
declaration.
Post by Dimitris Vyzovitis
Post by Dimitris Vyzovitis
-- vyzo
Out of curiosity, what is the number of LOC of C with and without the
inlining-limit?
Post by Dimitris Vyzovitis
Post by Dimitris Vyzovitis
I’m just wondering if this should be classified as an issue, or if the
inliner is just doing its work as expected.
Post by Dimitris Vyzovitis
Post by Dimitris Vyzovitis
Marc
Post by Dimitris Vyzovitis
It seems it's the inliner going haywire -- if I add a (declare
(inlining-limit 100)), then it compiles in 20s.
Post by Dimitris Vyzovitis
Post by Dimitris Vyzovitis
Post by Dimitris Vyzovitis
-- vyzo
On Mon, Mar 19, 2018 at 7:29 PM, Dimitris Vyzovitis <
The attached file results in a 140kloc monster that results in gcc
dying with OOM after several minutes of effort, and I would like to
understand why.
Post by Dimitris Vyzovitis
Post by Dimitris Vyzovitis
Post by Dimitris Vyzovitis
Any ideas?
-- vyzo
Marc Feeley
2018-03-19 19:15:29 UTC
Permalink
I believe the problem is that Gerbil is doing some inlining of sorts (for example tail duplication in the matcher) and then passing this to gsc which will also do some inlining of the functions in the code, and probably gcc also with the -O2 option.

Its understandable that this layering of languages will cause code bloat.

Marc
Sure, I'd like to get to the bottom of this because it's such an ugly failure mode!
For now, I changed the gerbil compiler to emit an inlining-limit declaration in meta phases (that's where the syntax-case monsters reside).
-- vyzo
I think you should figure out which of the configure options
--enable-single-host
--enable-c-opt
--enable-gcc-opts
are the most useful/beneficial for the kind of code generated by Gerbil. I suspect that --enable-single-host is the most performance-enhancing option, and --enable-c-opt only gives a marginal speed increase at the cost of a much higher C compile time.
If you do try the various combinations, please report your results here. I’d like to know if my intuition is correct.
Marc
My gambit is configured with --enable-single-host --enable-c-opt --enable-gcc-opts.
I have 8G on my current laptop, but I run without a swap; the death occurs at around 6G.
It's not only the memory usage though, it takes forever too. clang on travis didn't OOM, but it took 15min on the file.
I think it might be a case of really bad interaction between the various optimizers in the 3 compilers involved. The gerbil emitted code is already heavily optimized to perform match tree linearization (I have a shiny new optimizer that optimizes match and syntax- case expansions).
That means you can't reasonably inline anything other than single use procedures within the optimized blocks.
-- vyzo
The default inlining-limit is 350, so the expansion from 100 is quite possible.
But the problem here is that gcc chokes on the compilation of the C file. So… what are the compilation options passed to gcc?
- are you using --enable-single-host ?
- are you using a higher level of optimization such as -O2 or -O3 rather than the default -O1 ?
These will definitely increase the pressure on the C compiler. Also, some versions of gcc do a better job at compiling large C files. The file lib/_io.c in the Gambit distribution is about 90kloc and I have never gotten an OOM error from gcc while compiling it, even though I use a “make -j 8” (8 C compilations in parallel). I do have 16 GB of RAM on my machine… how much RAM do you have on yours?
Marc
It's 140kloc without the inlining declaration and just 22Kloc with the declaration.
-- vyzo
Out of curiosity, what is the number of LOC of C with and without the inlining-limit?
I’m just wondering if this should be classified as an issue, or if the inliner is just doing its work as expected.
Marc
It seems it's the inliner going haywire -- if I add a (declare (inlining-limit 100)), then it compiles in 20s.
-- vyzo
The attached file results in a 140kloc monster that results in gcc dying with OOM after several minutes of effort, and I would like to understand why.
Any ideas?
-- vyzo
Dimitris Vyzovitis
2018-03-19 19:27:25 UTC
Permalink
The match optimizer is very careful to avoid inlining code that doesn't
benefit from the current match tree, so there is very little tail
duplication.
I think it's the subsequent inlining and optimization from gsc/gcc that
results in the blow up, as the code really cannot be optimized further,
But yeah, layers of language will do that to you.

Note that in general I don't do optimizations that gsc already does, but we
can't reasonably expect gsc to understand and properly optimize
match/syntax-case expansions.

-- vyzo
Post by Marc Feeley
I believe the problem is that Gerbil is doing some inlining of sorts (for
example tail duplication in the matcher) and then passing this to gsc which
will also do some inlining of the functions in the code, and probably gcc
also with the -O2 option.
Its understandable that this layering of languages will cause code bloat.
Marc
Post by Dimitris Vyzovitis
Sure, I'd like to get to the bottom of this because it's such an ugly
failure mode!
Post by Dimitris Vyzovitis
For now, I changed the gerbil compiler to emit an inlining-limit
declaration in meta phases (that's where the syntax-case monsters reside).
Post by Dimitris Vyzovitis
-- vyzo
I think you should figure out which of the configure options
--enable-single-host
--enable-c-opt
--enable-gcc-opts
are the most useful/beneficial for the kind of code generated by
Gerbil. I suspect that --enable-single-host is the most
performance-enhancing option, and --enable-c-opt only gives a marginal
speed increase at the cost of a much higher C compile time.
Post by Dimitris Vyzovitis
If you do try the various combinations, please report your results
here. I’d like to know if my intuition is correct.
Post by Dimitris Vyzovitis
Marc
Post by Dimitris Vyzovitis
My gambit is configured with --enable-single-host --enable-c-opt
--enable-gcc-opts.
Post by Dimitris Vyzovitis
Post by Dimitris Vyzovitis
I have 8G on my current laptop, but I run without a swap; the death
occurs at around 6G.
Post by Dimitris Vyzovitis
Post by Dimitris Vyzovitis
It's not only the memory usage though, it takes forever too. clang on
travis didn't OOM, but it took 15min on the file.
Post by Dimitris Vyzovitis
Post by Dimitris Vyzovitis
I think it might be a case of really bad interaction between the
various optimizers in the 3 compilers involved. The gerbil emitted code is
already heavily optimized to perform match tree linearization (I have a
shiny new optimizer that optimizes match and syntax- case expansions).
Post by Dimitris Vyzovitis
Post by Dimitris Vyzovitis
That means you can't reasonably inline anything other than single use
procedures within the optimized blocks.
Post by Dimitris Vyzovitis
Post by Dimitris Vyzovitis
-- vyzo
The default inlining-limit is 350, so the expansion from 100 is quite
possible.
Post by Dimitris Vyzovitis
Post by Dimitris Vyzovitis
But the problem here is that gcc chokes on the compilation of the C
file. So
 what are the compilation options passed to gcc?
Post by Dimitris Vyzovitis
Post by Dimitris Vyzovitis
- are you using --enable-single-host ?
- are you using a higher level of optimization such as -O2 or -O3
rather than the default -O1 ?
Post by Dimitris Vyzovitis
Post by Dimitris Vyzovitis
These will definitely increase the pressure on the C compiler. Also,
some versions of gcc do a better job at compiling large C files. The file
lib/_io.c in the Gambit distribution is about 90kloc and I have never
gotten an OOM error from gcc while compiling it, even though I use a “make
-j 8” (8 C compilations in parallel). I do have 16 GB of RAM on my
machine
 how much RAM do you have on yours?
Post by Dimitris Vyzovitis
Post by Dimitris Vyzovitis
Marc
Post by Dimitris Vyzovitis
It's 140kloc without the inlining declaration and just 22Kloc with
the declaration.
Post by Dimitris Vyzovitis
Post by Dimitris Vyzovitis
Post by Dimitris Vyzovitis
-- vyzo
On Mon, Mar 19, 2018 at 8:27 PM, Marc Feeley <
Out of curiosity, what is the number of LOC of C with and without
the inlining-limit?
Post by Dimitris Vyzovitis
Post by Dimitris Vyzovitis
Post by Dimitris Vyzovitis
I’m just wondering if this should be classified as an issue, or if
the inliner is just doing its work as expected.
Post by Dimitris Vyzovitis
Post by Dimitris Vyzovitis
Post by Dimitris Vyzovitis
Marc
Post by Dimitris Vyzovitis
It seems it's the inliner going haywire -- if I add a (declare
(inlining-limit 100)), then it compiles in 20s.
Post by Dimitris Vyzovitis
Post by Dimitris Vyzovitis
Post by Dimitris Vyzovitis
Post by Dimitris Vyzovitis
-- vyzo
On Mon, Mar 19, 2018 at 7:29 PM, Dimitris Vyzovitis <
The attached file results in a 140kloc monster that results in gcc
dying with OOM after several minutes of effort, and I would like to
understand why.
Post by Dimitris Vyzovitis
Post by Dimitris Vyzovitis
Post by Dimitris Vyzovitis
Post by Dimitris Vyzovitis
Any ideas?
-- vyzo
Bradley Lucier
2018-03-22 20:12:08 UTC
Permalink
My gambit is configured with  --enable-single-host --enable-c-opt
--enable-gcc-opts.
My experience is that any optimization level above -O1, and any
optimization options not included in -O1, can result in significant CPU
time and/or memory usage on Gambit-generated C files with
--enable-single-host, and the GCC developers generally just say "don't
do that" in response (and I don't really blame them).

Still, one is more likely to get a problem like this fixed if one is
willing to work with the GCC developers (give test cases, compile gcc to
report internal memory usage and report results, follow up, etc.).

So my suggestion is to keep --enable-single-host (otherwise you're going
through the trampoline for each function call), keep the inlining-limit
default (or set it to at least 134 or 150 so data accessors and setters
are inlined) and get rid of --enable-c-opt and --enable-gcc-opts.

Brad
Bradley Lucier
2018-03-22 20:02:52 UTC
Permalink
Post by Dimitris Vyzovitis
It seems it's the inliner going haywire -- if I add a (declare
(inlining-limit 100)), then it compiles in 20s.
This does no inlining at all, not even of data accessors/setters/etc.
Dimitris Vyzovitis
2018-03-22 20:21:19 UTC
Permalink
Yeah, that's not quite intended -- I should lift it to 150 or so.

-- vyzo
Post by Bradley Lucier
Post by Dimitris Vyzovitis
It seems it's the inliner going haywire -- if I add a (declare
(inlining-limit 100)), then it compiles in 20s.
This does no inlining at all, not even of data accessors/setters/etc.
Dimitris Vyzovitis
2018-03-22 20:31:30 UTC
Permalink
After a little experimentation I set it to 200, doesn't seem to blow up in
the monster.
I'll try a full build to see if we have any other pathologies.

-- vyzo
Post by Dimitris Vyzovitis
Yeah, that's not quite intended -- I should lift it to 150 or so.
-- vyzo
Post by Bradley Lucier
Post by Dimitris Vyzovitis
It seems it's the inliner going haywire -- if I add a (declare
(inlining-limit 100)), then it compiles in 20s.
This does no inlining at all, not even of data accessors/setters/etc.
Bradley Lucier
2018-03-22 20:44:18 UTC
Permalink
Post by Dimitris Vyzovitis
Yeah, that's not quite intended -- I should lift it to 150 or so.
The declarations are

(declare (block) (standard-bindings) (extended-bindings))

Do you mean to compile this module with safety, so that each car checks
that the argument is a pair, each (fx+ x 1) checks that x and the result
are fixnums, etc?

If you do

gsc -c -expansion defparser__1.scm > expansion.scm

you'll see what gsc expands things to. With safe, you get

-rw-r--r-- 1 lucier lucier 2606303 Mar 22 16:29 expansion-safe.scm
-rw-r--r-- 1 lucier lucier 7919907 Mar 22 16:29 defparser__1-safe.c

with (declare (not safe)) you get

-rw-r--r-- 1 lucier lucier 844052 Mar 22 16:41 expansion.scm
-rw-r--r-- 1 lucier lucier 2287806 Mar 22 16:41 defparser__1.c

But I don't know what you want.

Brad
Dimitris Vyzovitis
2018-03-22 20:52:05 UTC
Permalink
I don't think that (declare (not safe)) is reasonable for default compiler
declaration, especially for phi code :)
But that's quite an interesting observation.

-- vyzo
Post by Bradley Lucier
Post by Dimitris Vyzovitis
Yeah, that's not quite intended -- I should lift it to 150 or so.
The declarations are
(declare (block) (standard-bindings) (extended-bindings))
Do you mean to compile this module with safety, so that each car checks
that the argument is a pair, each (fx+ x 1) checks that x and the result
are fixnums, etc?
If you do
gsc -c -expansion defparser__1.scm > expansion.scm
you'll see what gsc expands things to. With safe, you get
-rw-r--r-- 1 lucier lucier 2606303 Mar 22 16:29 expansion-safe.scm
-rw-r--r-- 1 lucier lucier 7919907 Mar 22 16:29 defparser__1-safe.c
with (declare (not safe)) you get
-rw-r--r-- 1 lucier lucier 844052 Mar 22 16:41 expansion.scm
-rw-r--r-- 1 lucier lucier 2287806 Mar 22 16:41 defparser__1.c
But I don't know what you want.
Brad
Adam
2018-03-23 04:42:33 UTC
Permalink
Post by Dimitris Vyzovitis
I don't think that (declare (not safe)) is reasonable for default compiler
declaration,
Agreed.
Post by Dimitris Vyzovitis
especially for phi code :)
Phi?
Dimitris Vyzovitis
2018-03-23 06:18:10 UTC
Permalink
Post by Dimitris Vyzovitis
especially for phi code :)
Phi?
That's phased code; it's (user) code that runs in the expander/compiler.

-- vyzo

Loading...