🎉 Celebrating 25 Years of GameDev.net! 🎉

Not many can claim 25 years on the Internet! Join us in celebrating this milestone. Learn more about our history, and thank you for being a part of our community!

Implementing a Garbage Collector for AngelScript

Started by
34 comments, last by WitchLord 19 years, 1 month ago
I need to implement a garbage collector for AngelScript, now that I'm adding script structures. I already have some knowledge on how this can be done, but if anyone here knows of any articles that explains different implementations I'd appreciate links to them. I have the following requirements for the GC: - It will only collect memory allocated for script declared objects. Application registered objects will be collected by the application. - It must be incremental so that it doesn't freeze the execution until the GC is finished. - It must be able to work without knowledge of application allocated memory, that may hold references to script objects. The external references will be reference counted. - It must be able to handle circular references. - It cannot move the memory around. If anyone can think of other requirements that would be necessary, or have any tips on what is better, I'd appreciate if you'd let me know. Regards, Andreas [Edited by - WitchLord on May 4, 2005 3:10:41 PM]

AngelCode.com - game development and more - Reference DB - game developer references
AngelScript - free scripting library - BMFont - free bitmap font generator - Tower - free puzzle game

Advertisement
The implementation of the Java HotSpot VM is quite interesting if you care to have a look. It may be overill for AS though.

On a related note, I think it would be a good idea to allow the API user to register memory allocation and deallocation functions. In C++ I have experienced memory fragmentation problems when many entities were created and deleted in an adhoc manner. I solved this problem by implementing my own 'new' and 'delete' operators to use object pooling.

For AS, providing the ability to replace memory functions, or even better, to define memory functions for specific object types would help in this regard as it would allow the more advanced performance-conscious users to plug in their own customised memory allocation scheme.

I think that this idea can work hand-in-hand with the garbage collector as, strictly speaking, the GC is mainly concerned with managing object lifetimes rather than memory allocation issues.
tIDE Tile Map Editorhttp://tide.codeplex.com
This paper gives a nice overview: uniprocessor garbage collection techniques
SharkBait:

I'll take a look at that VM. It might give me some ideas.

I'm way ahead of you ;) AngelScript already allows the applications to register their own memory allocation routines for registered object types. Take a look at the behaviours asBEHAVE_ALLOC and asBEHAVE_FREE. These were implemented in AngelScript 2.1.0.

They are also necessary if the objects are allocated in a DLL, but freed in the library, since the DLL and the main application aren't using the same memory manager by default.

Trap:

Thanks for the link. Too bad it's PostScript. I'll have to install GhostView again ;)

AngelCode.com - game development and more - Reference DB - game developer references
AngelScript - free scripting library - BMFont - free bitmap font generator - Tower - free puzzle game

I recommend looking at the incremental tracing garbage collector in GameMonkey script - it's not reference counted (although there is a RefCounted version you can use). The basic GC algorithm is simple; you define a colourset of white/black/grey. On the start of the GC cycle all objects are marked as gray, you then trace up from the roots and mark each object that is reachable as being 'black'. When the GC cycle is complete any gray objects are marked as 'white' and the cycle repeats again. At arbitrary times you will enter sweep mode (perhaps after the collection cycle) where all 'white' objects are destroyed and returned to their respective memory pools (or deleted, depending on how you do things). The GC is incremental as it performs a small amount of this work (either collect or sweep) on every frame meaning you won't get a pause when the GC kicks in. It's pretty effective and doesn't suffer from the problems of reference counting (cyclic references, even if the object is unreachable). One downside is that if you're allocating quicker than you're sweeping/collecting you may run out of memory, but you can always force a collect/sweep when you detect that your memory is running low.
It's available as .pdf at citeseer, but citeseer is down at the moment.

I would drop internal reference counting and only count external references. A well implemented GC is better than refcounting in almost any way.

Some more advanced GC implementations allow "pinning" of selected objects, which means it can use moving collection for all but the pinned objects. I don't know anything about angelscript so i don't know if it fits in there.
Quote: Original post by WitchLord
- It must be iterative so that it doesn't freeze the execution until the GC is finished.

IMO all mainstream (Java, C#) garbage collectors I'm aware of are shit in this regard. They wait until you run out of memory and then take a long freakin' time to do anything. Even if they run at some preset interval or during idle times or hints, they will still suck because they make preset statistical assumptions. It's like search engines: they can have excellent results for most common queries (which probably add up to 90% of all queries) but if they absolutely blow for those rare 10% nobody will use them. Same thing with garbage collectors: no matter how much research people do prior to implementing one and no matter how accurate the statistical assumptions are, at some points the garbage collector will inevitably suck which people will find absolutely unacceptable. Aside from tight cooperation between the CPU, the OS and the runtime in order to do realtime statistical analysis (we're not quite there yet [smile]) one is left with reference counting. Nice and simple: every time a reference count reaches zero, get rid of the object.
Quote: Original post by WitchLord
- It must be able to handle circular references.

It is a bitch, isn't it? [smile] While reference counting that doesn't manage circular references (CR) is easy (I implemented this in a couple of hours), dealing with this nasty problem is a tough nut to crack. I would recommend looking at Python (just google it, you'll see a few documents about their gc), however, since AngelScript is a static language you can aquire far superior results in this respect. Consider: at compile time you have enough information to figure out whether any given object has potential to have CRs (just walk the graph). Most objects will be CR-free (which is a huge optimization for your run-time environment, you'll just increment and decrement references and check for zero) while some will have the potential to cause problems. For these rare but painful objects you'll be able to do two things. Either A) upon every reference count modification walk the graph and free cyclic loops or B) mark them somehow (add to the list?) for future analysis and every once in a while perform the CR check on marked objects. IMO a compromise is best: at compile time estimate how long a CR check will take for a given object (distance of the loop on the graph is perfect for this) and let the user control the threshold at runtime. Anything less than some value (pick a reasonable default) will be checked immediately, while longer checks will be added to the list. Additionally let the user tweak how often the lists with "long" graphs is traversed and you'll be on your way to an excellent, general purpose GC. If AS is used for a game, only very short CR objects will be checked immediately while longer one will be added to the list and traversed either on every frame or during some downtimes (level loading, etc.) If AS is used for a web app, probably nothing will be checked immediately (why bother?) and the admin can set up a good collection interval (every 5 minutes, when allocation reaches 75% of available memory, etc.)
Quote: Original post by WitchLord
- It cannot move the memory around.

Inability to defragment your heap can cause your app to run out of memory (which sometimes happens to C++ applications that aren't careful). Ideally an OS should cooperate with the runtime to do this, but again, we aren't there. Are you sure you can't re-evaluate this requirement? "Lock" application owned objects (move them to a different heap first) and allow movement of objects that are only owned by your system.
Quote: Original post by Trap
A well implemented GC is better than refcounting in almost any way.

I strongly disagree with this statement for reasons stated in the above post.

EDIT: quick example. Generational garbage collectors assume that new objects are destroyed quickly and that old objects rarely reference new ones. What if I'm writing some sort of network routing software that stores http requuests in a hashtable for thirty seconds or until another request is sent to the same server from the same client? Suddenly the assumptions fall apart. New objects stay around for a long time and old object (hashtable) references the new objects all the time. Suddenly C# becomes "the wrong tool for the job, you have to use low level languages for this type of software". Well, why is that? The only reason I can't use C# in this case is because its garbage collector is completely inadequate for these "rare special cases" that seem to permuate almost all software applications. The only real way to get around this is to use reference counting or to compile real-time statistics (which is extremely inefficient since modern day OSs don't offer support for it). I've actually thought quite a bit about this and I've seriously considered hacking the kernel at some point to provide support for such analysis. It's a good way to get into kernel hacking too. Hmm.... *boots the linux box*

[Edited by - CoffeeMug on May 4, 2005 11:09:42 AM]
First a link to a .pdf of a more complete version of the gc paper

I didn't write "better in any way" but "better in almost any way". Of course it depends on the behaviour of the program and external requirements which way of reclaiming memory is better.
It's easy to come up with examples where either refcounting or a tracing gc is way better (by any metric).

The upper bound on memory management induced blocking time depends on the usage pattern both for refcounting and tracing gc.

Good GC implementations allow you to tweak the behaviour of the GC to fit to your memory usage pattern.
Quote: Original post by Trap
I didn't write "better in any way" but "better in almost any way".

Could you substantiate your argument? Cost of reference counting is negligent: dereference, increment/decrement, and a conditional followed by deletion if necessary (this is very cheap). You'd spend the same amount of time tracking handles in GC. When you're dealing with objects that have a cyclic reference problem, your system effectively becomes a GC. So, I'd say that reference counting is "better in almost any way", unless you're dealing with cyclic references in which case it's "at least as good".
Quote: Original post by Trap
It's easy to come up with examples where either refcounting or a tracing gc is way better (by any metric).

For any GC I'm aware of except reference counting one could easily point out a situation in which the costs of that GC would be prohibitive. I cannot think of situations in which reference counting would be prohibitive while a different form of GC wouldn't be.

This topic is closed to new replies.

Advertisement