🎉 Celebrating 25 Years of GameDev.net! 🎉

Not many can claim 25 years on the Internet! Join us in celebrating this milestone. Learn more about our history, and thank you for being a part of our community!

Implementing a Garbage Collector for AngelScript

Started by
34 comments, last by WitchLord 19 years, 1 month ago
evolutional:

Thanks for the tip. Actually, my own thoughts were moving along those lines, which is a good thing.

With GC it is always complicated in knowing when and how often to invoke the GC. This is another one of my open questions, but one that I think will be quite easy to resolve simply by allowing the application to decide.

Also, thanks for the hint. Of course, I meant to say incremental GC ;)

Trap:

Thanks for the link to the pdf file. I much prefer pdf over ps, since it much more accessible. The article is quite large though, so it will take a while to read it. I'm sure it will be worth it though.

Yes, I've been thinking about dropping internal reference counting. Still having internal reference counting can make it easier for the GC, since it will immediately know that the object can be freed. External reference counting obviously cannot be dropped, since the GC will not have access to memory allocated outside the library.

CoffeMug:

No GC algorithm is perfect in all situations, and neither is reference counting. Actually, I would classify reference counting just another form of GC.

I'll probably end up using reference counting to clean up non-cyclic references and an algorithm similar to what evolutional explained for the cyclic references. I believe this would give the best of both worlds. And of course, the application writer will have as much control as possible over how the GC is invoked.

I agree, that the ability to move objects around in memory would be a valuable feature. But it would require an extra pointer lookup when accessing object members, and a lot of work to change how AngelScript currently handles objects. Maybe it's something that I'll implement for a future version, but right now I feel that it's not an issue.

---

Please, don't turn this into a discussion of which is better: reference counting or garbage collections. They both have their merits and disadvantages.





AngelCode.com - game development and more - Reference DB - game developer references
AngelScript - free scripting library - BMFont - free bitmap font generator - Tower - free puzzle game

Advertisement
I have coded a simple test that creates 10000 1500-element single-linked lists by prepending a new entry every time and discarding the list right after it is created. It takes 40 s in C++ with boosts shared_ptr and only 2.7 s in Allegro Common Lisp, or 5.5 s in GNU clisp.
As this is a microbench i am sure i skewed the results in some way. What did i do wrong?

The profile says 1/2 the time is spend in shared_ptr dispose and 1/4 the time in operator new().

Edit: The same task using std::list takes 12s.
What happens if you pool your objects and allocate/free to the pool instead of new/delete?
Quote: Original post by Trap
The profile says 1/2 the time is spend in shared_ptr dispose and 1/4 the time in operator new().

Hmm, if it takes longer to dispose than to allocate something has to be wrong. What exactly is dipose method doing?

The tremendous difference has little to do with GC, it's mostly the fact that managed memory languages have constant time allocation while standard malloc is linear. It's absolutely necessary to use custom allocators for containers (stl already does this) if you want good allocation speed.
Im not familiar enough with boost to just write it down. Pool allocation works but the chunks aren't freed. It speed it up a bit though (if the pool is small enough not to cause swapping). 2.4 s for 1000 lists in C++, 0.33s in Lisp.

I could benchmark reversing lists too...
Quote: Original post by WitchLord
Please, don't turn this into a discussion of which is better: reference counting or garbage collections. They both have their merits and disadvantages.

Sorry, didn't mean to do that. I'm working on a similar system now and I'm interested in finding a good general purpose algorithm. I've read a lot of papers but for truly generic system I can't find anything that beats Python's approach. If someone could point me to something, I'd be very thankful [smile]

In general it appears that people confuse garbage collection with heap management (defrags, etc.) In particular, if you're writing a garbage collector it's very easy to defrag the heap. That gives you O(1) allocation and free deallocation (aside from the sweep process). Reference counting generally doesn't include any heap management functions, which is where confusion comes from. I think for a reference counting approach it's absolutely necessary to create a good small object allocator. Malloc is just no good.

More on this later.
Ok, now i did some list reversal benchmarks (1500 element list, 10000 reversals):
C++ shared_ptr 12s
Lisp 0.56s
Lisp allocating a new list for each reversal 2.6s

Here is my C++ code, the lisp code is trivial...
#include<boost/shared_ptr.hpp>#include<boost/pool/pool.hpp>#include<stdio.h>#include<list>#include<windows.h>using namespace boost;using namespace std;struct link;struct link{	link() :val(-1){};	link(int v,shared_ptr<link> n) : val(v),next(n){};	int val;	shared_ptr<link> next;};pool<default_user_allocator_new_delete> bla(sizeof(link));struct poolfree{void operator()(link* x){	bla.free(x);}};shared_ptr<link> listreverse(shared_ptr<link> first){	// doesn't handle short list special cases	shared_ptr<link> prev=first;	if(prev.get()!=0){		shared_ptr<link> cur = prev->next;		prev->next=shared_ptr<link>();		if(cur.get()!=0){			shared_ptr<link> next= cur->next;			while(next.get()!=0)			{				cur->next=prev;				prev=cur;				cur=next;				next=next->next;			}			cur->next=prev;			return cur;		}	}}void test(){	//doesn't free memory correctly, don't know why...	shared_ptr<link> first(new (bla.malloc()) link(),poolfree());	first->val=-111;	for(int i=1;i<1500;++i){		link* x = new (bla.malloc()) link(i, first);		shared_ptr<link> xf(x,poolfree());		first.swap(xf);	}	for(int i=0;i<10000;++i)		first = listreverse (first);}int main (int argc, char * argv[]) {    long long after,before,freq;    QueryPerformanceCounter((LARGE_INTEGER*)&before);    test();    QueryPerformanceCounter((LARGE_INTEGER*)&after);    QueryPerformanceFrequency((LARGE_INTEGER*)&freq);    fprintf(stderr,"%lf", ((double)after-before)/freq);    return 0;}
It's true that malloc is very slow in memory allocation.

AngelScript currently relies heavily on malloc, but it's only because I haven't put any effort into optimizing this version yet. Once I have the most important features I have planned implemented I'll start doing the optimizations. The memory allocation is one of the biggest optimizations I can do I believe, also AngelScript is currently doing quite a few unecessary allocations and object copies.

I need to resolve potential circular references with the introduction of script structures, and I believe a GC is the easiest way to do this. Doing what you said with having the release method search for circular references, seems to be complicated and kind of backwards. I could be wrong, but this is what I feel.

I'd like to learn more about Python's way of doing it. Do you have an article you recommend? I'll do that search on Google you mentioned, but I prefer recommendations since they tend to lead to better articles. ;)

I'm also slightly worried about multithreading, and how that impacts the garbage collection. AngelScript doesn't have official support for multithreading, though some parts have been prepared already and I'll try to continue to support it as well as possible. Reference counting and multithreading doesn't work very well together, there are too many interactions that may screw things up. GC seems to be much safer in a multithreaded environment.

AngelCode.com - game development and more - Reference DB - game developer references
AngelScript - free scripting library - BMFont - free bitmap font generator - Tower - free puzzle game

Trap:

If you are trying it is to show that GC is faster than reference counting then I believe you are going about it the wrong way. Lisp is obviously optimized to handle these cases, and your C++ code is far from optimized. Also, in this case the Lisp GC probably doesn't even kick in until after the program ends.

The boost shared_ptr<> isn't meant to be efficient, it is only meant to be easy to use. In fact the entire STL has been designed first most with ease-of-use in mind and efficiency comes almost as an after thought.

The boost shared_ptr<> is using reference counting, but it is doing so without changing the object's structure. Also by using it you're adding a pretty heavy overhead with the overloaded -> operator.

If you implemented the reference counting directly in the link structure, and skipped shared_ptr all together you would get a much better result.

If you want to compare efficiency of algorithms you must do so by implementing both in the same language.

AngelCode.com - game development and more - Reference DB - game developer references
AngelScript - free scripting library - BMFont - free bitmap font generator - Tower - free puzzle game

A tracing GC that supports multithreading isn't easy to write either. It looks easy at the first sight because an incremental tracing GC always behaves similar to a second thread, but a true multithreaded GC is one of the more difficult things to write.

@WitchLord:
I certainly won't implement a gc of the same quality as the lisp ones in C++. It is far too much work (probably it is not even possible since the GC depends on compiler support in Lisp). You can't implement refcounting in Lisp.

Do you know/have a faster refcounting implementation?

It is obvious to me that a refcounting gc has to be slower at reversing lists, as reversing lists is almost 100% pointer handling and refcounting at least doubles the price of pointer handling.
A good tracing GC has almost zero additional cost to pointer handling.

Edit: Redone the bench with boosts intrusive_ptr, C++ is now down to 1.9s with -Os, 0.7s with -O3, quite impressive...

Edit2:
Rebenched with 1500 entry list, 1000 reversals per list and 1000 lists
C++ -O3 68s
one Lisp 47.5s / with type declarations 24.3s
another Lisp implementation 20s

Edit3:
To add some numbers: Lisp spends 25% of its time to protect pointers for tracing (write barrier)
C++ inlines all calls so the profile shows 99% in listreverse which is not that helpful...

Another bench this time showing refcounting (with mempooling) win: 1 reversal, 100,000 lists
C++ 16s
Lisp 31s (8s program runtime, 23s gc) -> even with the fast allocator of lisp you have to manually manage memory if you allocate huge numbers of things. -> by tuning the gc parameters i managed to get 18s runtime for Lisp (explicit mempooling might be faster still)

[Edited by - Trap on May 4, 2005 7:13:06 PM]

This topic is closed to new replies.

Advertisement