Discussion:
[gambit-list] Dumping the heap
Dimitris Vyzovitis
2018-01-29 18:00:46 UTC
Permalink
Is there a reasonable way to dump the live heap?
It could help a lot with debugging memory leaks.

-- vyzo
Marc Feeley
2018-01-31 13:08:34 UTC
Permalink
Yes I remember helping Guillaume Cartier develop a procedure to do this. Perhaps he can help you with that.

Marc
Post by Dimitris Vyzovitis
Is there a reasonable way to dump the live heap?
It could help a lot with debugging memory leaks.
-- vyzo
Guillaume Cartier
2018-01-31 14:03:18 UTC
Permalink
Yes Marc wrote some very nice code to explore the Gambit heap
programmaticaly that I use in my projects. I'll refresh myself on the code
and post it in a Gambit friendly format shortly.

Guillaume
Post by Marc Feeley
Yes I remember helping Guillaume Cartier develop a procedure to do this.
Perhaps he can help you with that.
Marc
Post by Dimitris Vyzovitis
Is there a reasonable way to dump the live heap?
It could help a lot with debugging memory leaks.
-- vyzo
_______________________________________________
Gambit-list mailing list
https://webmail.iro.umontreal.ca/mailman/listinfo/gambit-list
Dimitris Vyzovitis
2018-01-31 14:23:35 UTC
Permalink
awesome, thank you!

-- vyzo
Post by Guillaume Cartier
Yes Marc wrote some very nice code to explore the Gambit heap
programmaticaly that I use in my projects. I'll refresh myself on the code
and post it in a Gambit friendly format shortly.
Guillaume
Post by Marc Feeley
Yes I remember helping Guillaume Cartier develop a procedure to do this.
Perhaps he can help you with that.
Marc
Post by Dimitris Vyzovitis
Is there a reasonable way to dump the live heap?
It could help a lot with debugging memory leaks.
-- vyzo
_______________________________________________
Gambit-list mailing list
https://webmail.iro.umontreal.ca/mailman/listinfo/gambit-list
Marc Feeley
2018-02-01 13:16:28 UTC
Permalink
thanks Guillaume!
this is a great start for me -- i am helping fare debug a memory leak, and it's really hard to identify
without dumping the heap to see what kind of object is leaking.
For your information I discovered a few memory leaks with the networking functions. They were due to “sockaddr” structures being converted to “still” Scheme objects with a reference count = 1, but the reference count was never decremented (with ___release_scmobj). This has been fixed in the recent UDP commit.

I believe that this kind of situation might exist in other places in the runtime system. So it might be useful to debug this to have a function that returns a list of all the “still” Scheme objects that have a reference count != 0. This should be easy to write… the GC maintains a list of the still objects in the C variable “still_objs”.

So the idea would be to check at the end of a program if there are any still objects with non-zero ref counts.

Marc
Dimitris Vyzovitis
2018-02-02 11:23:33 UTC
Permalink
Relevant code for accounting still objects:
https://gist.github.com/vyzo/ab4219382c0870779991d4c701921d2c

The limitation is that the still_objs_ is per processor, and not vm-wide.
Does that mean we would have to crawl all processors in SMP?

-- vyzo
thanks Guillaume!
this is a great start for me -- i am helping fare debug a memory leak,
and it's really hard to identify
without dumping the heap to see what kind of object is leaking.
For your information I discovered a few memory leaks with the networking
functions. They were due to “sockaddr” structures being converted to
“still” Scheme objects with a reference count = 1, but the reference count
was never decremented (with ___release_scmobj). This has been fixed in the
recent UDP commit.
I believe that this kind of situation might exist in other places in the
runtime system. So it might be useful to debug this to have a function
that returns a list of all the “still” Scheme objects that have a reference
count != 0. This should be easy to write
 the GC maintains a list of the
still objects in the C variable “still_objs”.
So the idea would be to check at the end of a program if there are any
still objects with non-zero ref counts.
Marc
Marc Feeley
2018-02-02 12:41:14 UTC
Permalink
Yes each processor has its own still_objs list and to account for all still objects you must iterate over the processors. In order to avoid modification of the still_objs lists while doing this the best approach is to use the barrier operation mechanism. That way all processors (but one) will be idle while iterating (or you could have all processors cooperate). This is done with the “on_all_processors” function. For an example, check out ___garbage_collect or ___fdset_resize in lib/setup.c .

Marc
Post by Dimitris Vyzovitis
https://gist.github.com/vyzo/ab4219382c0870779991d4c701921d2c
The limitation is that the still_objs_ is per processor, and not vm-wide.
Does that mean we would have to crawl all processors in SMP?
-- vyzo
thanks Guillaume!
this is a great start for me -- i am helping fare debug a memory leak, and it's really hard to identify
without dumping the heap to see what kind of object is leaking.
For your information I discovered a few memory leaks with the networking functions. They were due to “sockaddr” structures being converted to “still” Scheme objects with a reference count = 1, but the reference count was never decremented (with ___release_scmobj). This has been fixed in the recent UDP commit.
I believe that this kind of situation might exist in other places in the runtime system. So it might be useful to debug this to have a function that returns a list of all the “still” Scheme objects that have a reference count != 0. This should be easy to write… the GC maintains a list of the still objects in the C variable “still_objs”.
So the idea would be to check at the end of a program if there are any still objects with non-zero ref counts.
Marc
Dimitris Vyzovitis
2018-02-02 12:49:14 UTC
Permalink
it would be nice to have a primitive to do this for Scheme procedures!
Something like (on-all-processors thunk) would be awesome.

-- vyzo
Post by Marc Feeley
Yes each processor has its own still_objs list and to account for all
still objects you must iterate over the processors. In order to avoid
modification of the still_objs lists while doing this the best approach is
to use the barrier operation mechanism. That way all processors (but one)
will be idle while iterating (or you could have all processors cooperate).
This is done with the “on_all_processors” function. For an example, check
out ___garbage_collect or ___fdset_resize in lib/setup.c .
Marc
Post by Dimitris Vyzovitis
https://gist.github.com/vyzo/ab4219382c0870779991d4c701921d2c
The limitation is that the still_objs_ is per processor, and not vm-wide.
Does that mean we would have to crawl all processors in SMP?
-- vyzo
thanks Guillaume!
this is a great start for me -- i am helping fare debug a memory leak,
and it's really hard to identify
Post by Dimitris Vyzovitis
without dumping the heap to see what kind of object is leaking.
For your information I discovered a few memory leaks with the networking
functions. They were due to “sockaddr” structures being converted to
“still” Scheme objects with a reference count = 1, but the reference count
was never decremented (with ___release_scmobj). This has been fixed in the
recent UDP commit.
Post by Dimitris Vyzovitis
I believe that this kind of situation might exist in other places in the
runtime system. So it might be useful to debug this to have a function
that returns a list of all the “still” Scheme objects that have a reference
count != 0. This should be easy to write
 the GC maintains a list of the
still objects in the C variable “still_objs”.
Post by Dimitris Vyzovitis
So the idea would be to check at the end of a program if there are any
still objects with non-zero ref counts.
Post by Dimitris Vyzovitis
Marc
Marc Feeley
2018-02-02 12:57:59 UTC
Permalink
on_all_processors was designed for the lowest-level of the runtime system, I don’t think it is possible for the operation to be in Scheme (I’ll have to thinks about what the constraints are on the operation).

Marc
Post by Dimitris Vyzovitis
it would be nice to have a primitive to do this for Scheme procedures!
Something like (on-all-processors thunk) would be awesome.
-- vyzo
Yes each processor has its own still_objs list and to account for all still objects you must iterate over the processors. In order to avoid modification of the still_objs lists while doing this the best approach is to use the barrier operation mechanism. That way all processors (but one) will be idle while iterating (or you could have all processors cooperate). This is done with the “on_all_processors” function. For an example, check out ___garbage_collect or ___fdset_resize in lib/setup.c .
Marc
Post by Dimitris Vyzovitis
https://gist.github.com/vyzo/ab4219382c0870779991d4c701921d2c
The limitation is that the still_objs_ is per processor, and not vm-wide.
Does that mean we would have to crawl all processors in SMP?
-- vyzo
thanks Guillaume!
this is a great start for me -- i am helping fare debug a memory leak, and it's really hard to identify
without dumping the heap to see what kind of object is leaking.
For your information I discovered a few memory leaks with the networking functions. They were due to “sockaddr” structures being converted to “still” Scheme objects with a reference count = 1, but the reference count was never decremented (with ___release_scmobj). This has been fixed in the recent UDP commit.
I believe that this kind of situation might exist in other places in the runtime system. So it might be useful to debug this to have a function that returns a list of all the “still” Scheme objects that have a reference count != 0. This should be easy to write… the GC maintains a list of the still objects in the C variable “still_objs”.
So the idea would be to check at the end of a program if there are any still objects with non-zero ref counts.
Marc
Dimitris Vyzovitis
2018-02-02 13:00:12 UTC
Permalink
well, perhaps we can think about the right primitive for Scheme level
operations.
the semantics could be something like "execute this thunk on all
processors, and
don't do any switches until it has finished executing".

-- vyzo
Post by Marc Feeley
on_all_processors was designed for the lowest-level of the runtime system,
I don’t think it is possible for the operation to be in Scheme (I’ll have
to thinks about what the constraints are on the operation).
Marc
Post by Dimitris Vyzovitis
it would be nice to have a primitive to do this for Scheme procedures!
Something like (on-all-processors thunk) would be awesome.
-- vyzo
Yes each processor has its own still_objs list and to account for all
still objects you must iterate over the processors. In order to avoid
modification of the still_objs lists while doing this the best approach is
to use the barrier operation mechanism. That way all processors (but one)
will be idle while iterating (or you could have all processors cooperate).
This is done with the “on_all_processors” function. For an example, check
out ___garbage_collect or ___fdset_resize in lib/setup.c .
Post by Dimitris Vyzovitis
Marc
Post by Dimitris Vyzovitis
https://gist.github.com/vyzo/ab4219382c0870779991d4c701921d2c
The limitation is that the still_objs_ is per processor, and not
vm-wide.
Post by Dimitris Vyzovitis
Post by Dimitris Vyzovitis
Does that mean we would have to crawl all processors in SMP?
-- vyzo
thanks Guillaume!
this is a great start for me -- i am helping fare debug a memory
leak, and it's really hard to identify
Post by Dimitris Vyzovitis
Post by Dimitris Vyzovitis
without dumping the heap to see what kind of object is leaking.
For your information I discovered a few memory leaks with the
networking functions. They were due to “sockaddr” structures being
converted to “still” Scheme objects with a reference count = 1, but the
reference count was never decremented (with ___release_scmobj). This has
been fixed in the recent UDP commit.
Post by Dimitris Vyzovitis
Post by Dimitris Vyzovitis
I believe that this kind of situation might exist in other places in
the runtime system. So it might be useful to debug this to have a function
that returns a list of all the “still” Scheme objects that have a reference
count != 0. This should be easy to write
 the GC maintains a list of the
still objects in the C variable “still_objs”.
Post by Dimitris Vyzovitis
Post by Dimitris Vyzovitis
So the idea would be to check at the end of a program if there are any
still objects with non-zero ref counts.
Post by Dimitris Vyzovitis
Post by Dimitris Vyzovitis
Marc
Dimitris Vyzovitis
2018-02-02 13:07:17 UTC
Permalink
perhaps the "don't switch" semantics are too much.
a simpler general purpose primitive would be an `on-all-processors` that
spawns
a thread on each processor to execute the thunk and completes when all
thunks
have completed.

that's likely implementable without any deep support from the runtime.

-- vyzo
Post by Dimitris Vyzovitis
well, perhaps we can think about the right primitive for Scheme level
operations.
the semantics could be something like "execute this thunk on all
processors, and
don't do any switches until it has finished executing".
-- vyzo
Post by Marc Feeley
on_all_processors was designed for the lowest-level of the runtime
system, I don’t think it is possible for the operation to be in Scheme
(I’ll have to thinks about what the constraints are on the operation).
Marc
Post by Dimitris Vyzovitis
it would be nice to have a primitive to do this for Scheme procedures!
Something like (on-all-processors thunk) would be awesome.
-- vyzo
Yes each processor has its own still_objs list and to account for all
still objects you must iterate over the processors. In order to avoid
modification of the still_objs lists while doing this the best approach is
to use the barrier operation mechanism. That way all processors (but one)
will be idle while iterating (or you could have all processors cooperate).
This is done with the “on_all_processors” function. For an example, check
out ___garbage_collect or ___fdset_resize in lib/setup.c .
Post by Dimitris Vyzovitis
Marc
Post by Dimitris Vyzovitis
https://gist.github.com/vyzo/ab4219382c0870779991d4c701921d2c
The limitation is that the still_objs_ is per processor, and not
vm-wide.
Post by Dimitris Vyzovitis
Post by Dimitris Vyzovitis
Does that mean we would have to crawl all processors in SMP?
-- vyzo
thanks Guillaume!
this is a great start for me -- i am helping fare debug a memory
leak, and it's really hard to identify
Post by Dimitris Vyzovitis
Post by Dimitris Vyzovitis
without dumping the heap to see what kind of object is leaking.
For your information I discovered a few memory leaks with the
networking functions. They were due to “sockaddr” structures being
converted to “still” Scheme objects with a reference count = 1, but the
reference count was never decremented (with ___release_scmobj). This has
been fixed in the recent UDP commit.
Post by Dimitris Vyzovitis
Post by Dimitris Vyzovitis
I believe that this kind of situation might exist in other places in
the runtime system. So it might be useful to debug this to have a function
that returns a list of all the “still” Scheme objects that have a reference
count != 0. This should be easy to write
 the GC maintains a list of the
still objects in the C variable “still_objs”.
Post by Dimitris Vyzovitis
Post by Dimitris Vyzovitis
So the idea would be to check at the end of a program if there are
any still objects with non-zero ref counts.
Post by Dimitris Vyzovitis
Post by Dimitris Vyzovitis
Marc
Marc Feeley
2018-02-02 13:32:10 UTC
Permalink
The SMP scheduler support “pinning” threads to processors, so perhaps this is implementable. However… why do you need this? I don’t like exposing the processor concept or pinning, which are low-level concepts.

Marc
Post by Dimitris Vyzovitis
perhaps the "don't switch" semantics are too much.
a simpler general purpose primitive would be an `on-all-processors` that spawns
a thread on each processor to execute the thunk and completes when all thunks
have completed.
that's likely implementable without any deep support from the runtime.
-- vyzo
well, perhaps we can think about the right primitive for Scheme level operations.
the semantics could be something like "execute this thunk on all processors, and
don't do any switches until it has finished executing".
-- vyzo
on_all_processors was designed for the lowest-level of the runtime system, I don’t think it is possible for the operation to be in Scheme (I’ll have to thinks about what the constraints are on the operation).
Marc
Post by Dimitris Vyzovitis
it would be nice to have a primitive to do this for Scheme procedures!
Something like (on-all-processors thunk) would be awesome.
-- vyzo
Yes each processor has its own still_objs list and to account for all still objects you must iterate over the processors. In order to avoid modification of the still_objs lists while doing this the best approach is to use the barrier operation mechanism. That way all processors (but one) will be idle while iterating (or you could have all processors cooperate). This is done with the “on_all_processors” function. For an example, check out ___garbage_collect or ___fdset_resize in lib/setup.c .
Marc
Post by Dimitris Vyzovitis
https://gist.github.com/vyzo/ab4219382c0870779991d4c701921d2c
The limitation is that the still_objs_ is per processor, and not vm-wide.
Does that mean we would have to crawl all processors in SMP?
-- vyzo
thanks Guillaume!
this is a great start for me -- i am helping fare debug a memory leak, and it's really hard to identify
without dumping the heap to see what kind of object is leaking.
For your information I discovered a few memory leaks with the networking functions. They were due to “sockaddr” structures being converted to “still” Scheme objects with a reference count = 1, but the reference count was never decremented (with ___release_scmobj). This has been fixed in the recent UDP commit.
I believe that this kind of situation might exist in other places in the runtime system. So it might be useful to debug this to have a function that returns a list of all the “still” Scheme objects that have a reference count != 0. This should be easy to write… the GC maintains a list of the still objects in the C variable “still_objs”.
So the idea would be to check at the end of a program if there are any still objects with non-zero ref counts.
Marc
Dimitris Vyzovitis
2018-02-02 13:42:21 UTC
Permalink
pinning, if exposed, should be sufficient to implement it purely in
userland.

it would be immediately useful for my heap dumper -- i could use it to get
a
vector of stills from all processors with the
count-still-objects/get-still-objects
procedures before starting the walk and use that to ensure that all stills
(at the
beginning of the walk) are accounted for.

it would also be useful for implementing a parallel dispatch primitive that
utilizes
all cores maximally. say you have a parallel algorithm that you want to
decompose
into per core tasks, that could be accomplished with on-all-processors (or
a similar
primitive based on pinning). and it doesn't have to be a compute
algorithm, i/o could
benefit too.

-- vyzo
The SMP scheduler support “pinning” threads to processors, so perhaps this
is implementable. However
 why do you need this? I don’t like exposing
the processor concept or pinning, which are low-level concepts.
Marc
Post by Dimitris Vyzovitis
perhaps the "don't switch" semantics are too much.
a simpler general purpose primitive would be an `on-all-processors` that
spawns
Post by Dimitris Vyzovitis
a thread on each processor to execute the thunk and completes when all
thunks
Post by Dimitris Vyzovitis
have completed.
that's likely implementable without any deep support from the runtime.
-- vyzo
well, perhaps we can think about the right primitive for Scheme level
operations.
Post by Dimitris Vyzovitis
the semantics could be something like "execute this thunk on all
processors, and
Post by Dimitris Vyzovitis
don't do any switches until it has finished executing".
-- vyzo
on_all_processors was designed for the lowest-level of the runtime
system, I don’t think it is possible for the operation to be in Scheme
(I’ll have to thinks about what the constraints are on the operation).
Post by Dimitris Vyzovitis
Marc
Post by Dimitris Vyzovitis
it would be nice to have a primitive to do this for Scheme procedures!
Something like (on-all-processors thunk) would be awesome.
-- vyzo
Yes each processor has its own still_objs list and to account for all
still objects you must iterate over the processors. In order to avoid
modification of the still_objs lists while doing this the best approach is
to use the barrier operation mechanism. That way all processors (but one)
will be idle while iterating (or you could have all processors cooperate).
This is done with the “on_all_processors” function. For an example, check
out ___garbage_collect or ___fdset_resize in lib/setup.c .
Post by Dimitris Vyzovitis
Post by Dimitris Vyzovitis
Marc
Post by Dimitris Vyzovitis
https://gist.github.com/vyzo/ab4219382c0870779991d4c701921d2c
The limitation is that the still_objs_ is per processor, and not
vm-wide.
Post by Dimitris Vyzovitis
Post by Dimitris Vyzovitis
Post by Dimitris Vyzovitis
Does that mean we would have to crawl all processors in SMP?
-- vyzo
thanks Guillaume!
this is a great start for me -- i am helping fare debug a memory
leak, and it's really hard to identify
Post by Dimitris Vyzovitis
Post by Dimitris Vyzovitis
Post by Dimitris Vyzovitis
without dumping the heap to see what kind of object is leaking.
For your information I discovered a few memory leaks with the
networking functions. They were due to “sockaddr” structures being
converted to “still” Scheme objects with a reference count = 1, but the
reference count was never decremented (with ___release_scmobj). This has
been fixed in the recent UDP commit.
Post by Dimitris Vyzovitis
Post by Dimitris Vyzovitis
Post by Dimitris Vyzovitis
I believe that this kind of situation might exist in other places in
the runtime system. So it might be useful to debug this to have a function
that returns a list of all the “still” Scheme objects that have a reference
count != 0. This should be easy to write
 the GC maintains a list of the
still objects in the C variable “still_objs”.
Post by Dimitris Vyzovitis
Post by Dimitris Vyzovitis
Post by Dimitris Vyzovitis
So the idea would be to check at the end of a program if there are
any still objects with non-zero ref counts.
Post by Dimitris Vyzovitis
Post by Dimitris Vyzovitis
Post by Dimitris Vyzovitis
Marc
Dimitris Vyzovitis
2018-02-02 13:47:31 UTC
Permalink
note: you don't have to implement on-all-processors, i can do that in
gerbil stdlib;
but pinning needs to be exposed in some way (## procedures are fine).

-- vyzo
Post by Dimitris Vyzovitis
pinning, if exposed, should be sufficient to implement it purely in
userland.
it would be immediately useful for my heap dumper -- i could use it to get
a
vector of stills from all processors with the
count-still-objects/get-still-objects
procedures before starting the walk and use that to ensure that all stills
(at the
beginning of the walk) are accounted for.
it would also be useful for implementing a parallel dispatch primitive
that utilizes
all cores maximally. say you have a parallel algorithm that you want to
decompose
into per core tasks, that could be accomplished with on-all-processors (or
a similar
primitive based on pinning). and it doesn't have to be a compute
algorithm, i/o could
benefit too.
-- vyzo
The SMP scheduler support “pinning” threads to processors, so perhaps
this is implementable. However
 why do you need this? I don’t like
exposing the processor concept or pinning, which are low-level concepts.
Marc
Post by Dimitris Vyzovitis
perhaps the "don't switch" semantics are too much.
a simpler general purpose primitive would be an `on-all-processors`
that spawns
Post by Dimitris Vyzovitis
a thread on each processor to execute the thunk and completes when all
thunks
Post by Dimitris Vyzovitis
have completed.
that's likely implementable without any deep support from the runtime.
-- vyzo
well, perhaps we can think about the right primitive for Scheme level
operations.
Post by Dimitris Vyzovitis
the semantics could be something like "execute this thunk on all
processors, and
Post by Dimitris Vyzovitis
don't do any switches until it has finished executing".
-- vyzo
on_all_processors was designed for the lowest-level of the runtime
system, I don’t think it is possible for the operation to be in Scheme
(I’ll have to thinks about what the constraints are on the operation).
Post by Dimitris Vyzovitis
Marc
Post by Dimitris Vyzovitis
it would be nice to have a primitive to do this for Scheme procedures!
Something like (on-all-processors thunk) would be awesome.
-- vyzo
Yes each processor has its own still_objs list and to account for all
still objects you must iterate over the processors. In order to avoid
modification of the still_objs lists while doing this the best approach is
to use the barrier operation mechanism. That way all processors (but one)
will be idle while iterating (or you could have all processors cooperate).
This is done with the “on_all_processors” function. For an example, check
out ___garbage_collect or ___fdset_resize in lib/setup.c .
Post by Dimitris Vyzovitis
Post by Dimitris Vyzovitis
Marc
Post by Dimitris Vyzovitis
https://gist.github.com/vyzo/ab4219382c0870779991d4c701921d2c
The limitation is that the still_objs_ is per processor, and not
vm-wide.
Post by Dimitris Vyzovitis
Post by Dimitris Vyzovitis
Post by Dimitris Vyzovitis
Does that mean we would have to crawl all processors in SMP?
-- vyzo
On Thu, Feb 1, 2018 at 3:16 PM, Marc Feeley <
thanks Guillaume!
this is a great start for me -- i am helping fare debug a memory
leak, and it's really hard to identify
Post by Dimitris Vyzovitis
Post by Dimitris Vyzovitis
Post by Dimitris Vyzovitis
without dumping the heap to see what kind of object is leaking.
For your information I discovered a few memory leaks with the
networking functions. They were due to “sockaddr” structures being
converted to “still” Scheme objects with a reference count = 1, but the
reference count was never decremented (with ___release_scmobj). This has
been fixed in the recent UDP commit.
Post by Dimitris Vyzovitis
Post by Dimitris Vyzovitis
Post by Dimitris Vyzovitis
I believe that this kind of situation might exist in other places
in the runtime system. So it might be useful to debug this to have a
function that returns a list of all the “still” Scheme objects that have a
reference count != 0. This should be easy to write
 the GC maintains a
list of the still objects in the C variable “still_objs”.
Post by Dimitris Vyzovitis
Post by Dimitris Vyzovitis
Post by Dimitris Vyzovitis
So the idea would be to check at the end of a program if there are
any still objects with non-zero ref counts.
Post by Dimitris Vyzovitis
Post by Dimitris Vyzovitis
Post by Dimitris Vyzovitis
Marc
Marc Feeley
2018-02-02 13:55:02 UTC
Permalink
I agree that more low-level stuff could be implemented with this low-level mechanism (pinning). The problem is that these things might interfere with the implementation of important features, such as automatic load balancing.

This is a recurring tradeoff when designing a system… exposing low level implementation features gives more control to the user/programmer but later in the design cycle it may hinder or prevent implementing some other features.

Anyway, I’ll have to think about this specific case… I think thread pinning may be OK with the current model that each processor has its dedicated thread run queue.

Marc
pinning, if exposed, should be sufficient to implement it purely in userland.
it would be immediately useful for my heap dumper -- i could use it to get a
vector of stills from all processors with the count-still-objects/get-still-objects
procedures before starting the walk and use that to ensure that all stills (at the
beginning of the walk) are accounted for.
it would also be useful for implementing a parallel dispatch primitive that utilizes
all cores maximally. say you have a parallel algorithm that you want to decompose
into per core tasks, that could be accomplished with on-all-processors (or a similar
primitive based on pinning). and it doesn't have to be a compute algorithm, i/o could
benefit too.
-- vyzo
The SMP scheduler support “pinning” threads to processors, so perhaps this is implementable. However… why do you need this? I don’t like exposing the processor concept or pinning, which are low-level concepts.
Marc
Post by Dimitris Vyzovitis
perhaps the "don't switch" semantics are too much.
a simpler general purpose primitive would be an `on-all-processors` that spawns
a thread on each processor to execute the thunk and completes when all thunks
have completed.
that's likely implementable without any deep support from the runtime.
-- vyzo
well, perhaps we can think about the right primitive for Scheme level operations.
the semantics could be something like "execute this thunk on all processors, and
don't do any switches until it has finished executing".
-- vyzo
on_all_processors was designed for the lowest-level of the runtime system, I don’t think it is possible for the operation to be in Scheme (I’ll have to thinks about what the constraints are on the operation).
Marc
Post by Dimitris Vyzovitis
it would be nice to have a primitive to do this for Scheme procedures!
Something like (on-all-processors thunk) would be awesome.
-- vyzo
Yes each processor has its own still_objs list and to account for all still objects you must iterate over the processors. In order to avoid modification of the still_objs lists while doing this the best approach is to use the barrier operation mechanism. That way all processors (but one) will be idle while iterating (or you could have all processors cooperate). This is done with the “on_all_processors” function. For an example, check out ___garbage_collect or ___fdset_resize in lib/setup.c .
Marc
Post by Dimitris Vyzovitis
https://gist.github.com/vyzo/ab4219382c0870779991d4c701921d2c
The limitation is that the still_objs_ is per processor, and not vm-wide.
Does that mean we would have to crawl all processors in SMP?
-- vyzo
thanks Guillaume!
this is a great start for me -- i am helping fare debug a memory leak, and it's really hard to identify
without dumping the heap to see what kind of object is leaking.
For your information I discovered a few memory leaks with the networking functions. They were due to “sockaddr” structures being converted to “still” Scheme objects with a reference count = 1, but the reference count was never decremented (with ___release_scmobj). This has been fixed in the recent UDP commit.
I believe that this kind of situation might exist in other places in the runtime system. So it might be useful to debug this to have a function that returns a list of all the “still” Scheme objects that have a reference count != 0. This should be easy to write… the GC maintains a list of the still objects in the C variable “still_objs”.
So the idea would be to check at the end of a program if there are any still objects with non-zero ref counts.
Marc
Adam
2018-02-02 14:29:48 UTC
Permalink
I also think pinning could have some utility. I do see the point that
perfect work stealing would make most pinning useless.


What about pinning as a way to get extra high speed on messaging and
locking primitives between given green threads pinned to one and the same
Gambit processor (= OS thread)? (So that is higher speed from enjoying the
memory coherency you get when locating involved execution to one single
core.)


One utility with pinning would be that you could let a particular
computation's speed be capped to one or a given number of CPU cores' total
speed, sometimes that is relevant.

Also on a low level, one could want to designate a particular OS thread
priority to a particular Gambit processor to get a particular performance
characteristic that way, e.g. an OS thread with the lowest possible
execution priority setting, to do some slumbering low-priority background
task only.


Maybe there could be a point with having a designated Gambit processor (=
OS thread) for particular blocking (C) operations, not sure.
Post by Marc Feeley
I agree that more low-level stuff could be implemented with this low-level
mechanism (pinning). The problem is that these things might interfere with
the implementation of important features, such as automatic load balancing.
This is a recurring tradeoff when designing a system
 exposing low level
implementation features gives more control to the user/programmer but later
in the design cycle it may hinder or prevent implementing some other
features.
Anyway, I’ll have to think about this specific case
 I think thread
pinning may be OK with the current model that each processor has its
dedicated thread run queue.
Marc
Post by Dimitris Vyzovitis
pinning, if exposed, should be sufficient to implement it purely in
userland.
Post by Dimitris Vyzovitis
it would be immediately useful for my heap dumper -- i could use it to
get a
Post by Dimitris Vyzovitis
vector of stills from all processors with the
count-still-objects/get-still-objects
Post by Dimitris Vyzovitis
procedures before starting the walk and use that to ensure that all
stills (at the
Post by Dimitris Vyzovitis
beginning of the walk) are accounted for.
it would also be useful for implementing a parallel dispatch primitive
that utilizes
Post by Dimitris Vyzovitis
all cores maximally. say you have a parallel algorithm that you want to
decompose
Post by Dimitris Vyzovitis
into per core tasks, that could be accomplished with on-all-processors
(or a similar
Post by Dimitris Vyzovitis
primitive based on pinning). and it doesn't have to be a compute
algorithm, i/o could
Post by Dimitris Vyzovitis
benefit too.
-- vyzo
The SMP scheduler support “pinning” threads to processors, so perhaps
this is implementable. However
 why do you need this? I don’t like
exposing the processor concept or pinning, which are low-level concepts.
Post by Dimitris Vyzovitis
Marc
Post by Dimitris Vyzovitis
perhaps the "don't switch" semantics are too much.
a simpler general purpose primitive would be an `on-all-processors`
that spawns
Post by Dimitris Vyzovitis
Post by Dimitris Vyzovitis
a thread on each processor to execute the thunk and completes when all
thunks
Post by Dimitris Vyzovitis
Post by Dimitris Vyzovitis
have completed.
that's likely implementable without any deep support from the runtime.
-- vyzo
well, perhaps we can think about the right primitive for Scheme level
operations.
Post by Dimitris Vyzovitis
Post by Dimitris Vyzovitis
the semantics could be something like "execute this thunk on all
processors, and
Post by Dimitris Vyzovitis
Post by Dimitris Vyzovitis
don't do any switches until it has finished executing".
-- vyzo
on_all_processors was designed for the lowest-level of the runtime
system, I don’t think it is possible for the operation to be in Scheme
(I’ll have to thinks about what the constraints are on the operation).
Post by Dimitris Vyzovitis
Post by Dimitris Vyzovitis
Marc
Post by Dimitris Vyzovitis
it would be nice to have a primitive to do this for Scheme
procedures!
Post by Dimitris Vyzovitis
Post by Dimitris Vyzovitis
Post by Dimitris Vyzovitis
Something like (on-all-processors thunk) would be awesome.
-- vyzo
Yes each processor has its own still_objs list and to account for
all still objects you must iterate over the processors. In order to avoid
modification of the still_objs lists while doing this the best approach is
to use the barrier operation mechanism. That way all processors (but one)
will be idle while iterating (or you could have all processors cooperate).
This is done with the “on_all_processors” function. For an example, check
out ___garbage_collect or ___fdset_resize in lib/setup.c .
Post by Dimitris Vyzovitis
Post by Dimitris Vyzovitis
Post by Dimitris Vyzovitis
Marc
Post by Dimitris Vyzovitis
https://gist.github.com/vyzo/ab4219382c0870779991d4c701921d2c
The limitation is that the still_objs_ is per processor, and not
vm-wide.
Post by Dimitris Vyzovitis
Post by Dimitris Vyzovitis
Post by Dimitris Vyzovitis
Post by Dimitris Vyzovitis
Does that mean we would have to crawl all processors in SMP?
-- vyzo
On Thu, Feb 1, 2018 at 3:16 PM, Marc Feeley <
thanks Guillaume!
this is a great start for me -- i am helping fare debug a memory
leak, and it's really hard to identify
Post by Dimitris Vyzovitis
Post by Dimitris Vyzovitis
Post by Dimitris Vyzovitis
Post by Dimitris Vyzovitis
without dumping the heap to see what kind of object is leaking.
For your information I discovered a few memory leaks with the
networking functions. They were due to “sockaddr” structures being
converted to “still” Scheme objects with a reference count = 1, but the
reference count was never decremented (with ___release_scmobj). This has
been fixed in the recent UDP commit.
Post by Dimitris Vyzovitis
Post by Dimitris Vyzovitis
Post by Dimitris Vyzovitis
Post by Dimitris Vyzovitis
I believe that this kind of situation might exist in other places
in the runtime system. So it might be useful to debug this to have a
function that returns a list of all the “still” Scheme objects that have a
reference count != 0. This should be easy to write
 the GC maintains a
list of the still objects in the C variable “still_objs”.
Post by Dimitris Vyzovitis
Post by Dimitris Vyzovitis
Post by Dimitris Vyzovitis
Post by Dimitris Vyzovitis
So the idea would be to check at the end of a program if there are
any still objects with non-zero ref counts.
Post by Dimitris Vyzovitis
Post by Dimitris Vyzovitis
Post by Dimitris Vyzovitis
Post by Dimitris Vyzovitis
Marc
_______________________________________________
Gambit-list mailing list
https://webmail.iro.umontreal.ca/mailman/listinfo/gambit-list
Loading...