Friday, January 19, 2007

Fun with Objective-C

Following a mail from david on étoilé dev about how it could be nice to try allocating ObjC objects on the stack, I played with a couple of things...

First I wrote a mini benchmark -- get a very basic object (a couple of ivars, one method "plop" assigning a value to an ivar), and then create an instance, initialize it (call the init method), call the one method the object has, then deallocate the object. 10000000 times. As far as micro benchark goes this one is pretty stupid but well, that'll give us some ideas of what's going on.

Without optimizations: ~4.10s (on a macbook pro 2.16Ghz, core duo)

Not that great -- doing the same thing in C++ with a similar object here is the timings I get:

Objects created on the stack: 0.24s (17 times faster !)
Objects created on the heap: 1.37s (3 times faster)

Ouch, the poor Objective-C... not a surprise when allocating objects on the stack, but even allocating them on the heap C++ is still 3 times faster.

One obvious reason is that Objective-C calls a lot of methods when creating an object; and a method call is more costly than a C++ function call. Still...

So the obvious idea here is to cache some method calls (ask their address then call the function directly -- as a C function). I restrained myself to alloc/init, the method the object had ("plop"), and the release method. Of course, there's other methods that are called by those, that won't be cached.

Caching method calls: 2.77s

It's still twice as slow as creating the C++ object on the heap, but it's anyway a nice performance increase.

Ok, then our only option left is to allocate the Objective-C object on the stack. Surprise:

ObjC objects created on the stack: 0.23s
ObjC objects created on the stack + cached imps: 0.13s

:-)

[note of course that it's a stupid micro-benchmark that doesn't prove much, but it's fun]


...


What ? how can you allocate objective-c objects on the stack ? ah well... you can:


#define STACKCLASS(class) typedef struct { @defs(class) } \
__CLASS_ON_STACK__ ## class;


#define STACKOBJECTISA(objectName,className,classIsa) \
__CLASS_ON_STACK__ ## className __INSTANCE_ON_STACK__ ## objectName; \
__INSTANCE_ON_STACK__ ## objectName.isa = classIsa; \
className* objectName = (className*)& __INSTANCE_ON_STACK__ ## objectName;


#define STACKOBJECT(objectName,className) \
STACKOBJECTISA(objectName,className,[className class]);

Here is an example:

STACKCLASS(Test);

int i;
for (i=0; i< 10000000; i++)
{
STACKOBJECT(test,Test);
[test init];
[test plop];
}

Basically, create the corresponding struct for the class and set the isa member (you can also cache the class isa to gain one message send). A bit of a hack, but that seem to work ok :)

Note of course that by doing that, you loose flexibility -- exit class clusters for instance (eg classes that actually returns another class instance, such as NSNumber), you can only work with concrete classes. And also, don't call directly -dealloc as it would try to deallocate the object, which is not needed as it was created on the stack (so if you need to do some cleanup you should do it in another method).

Here's the macros I used for imp caching (fairly straightforward):

#define CALLIMP(imp,object,sel,args...) \
(*imp)(object, @selector(sel) , ##args)
#define GETIMP(class,sel) [class methodForSelector: @selector(sel)];

You use them like that:

IMP imp1 = GETIMP(Test,alloc);
id p = [Test new];
IMP imp2 = GETIMP(p,init);
IMP imp3 = GETIMP(p,plop);
IMP imp4 = GETIMP(p,release);
[p release];

int i;
for (i=0; i< 10000000; i++)
{
id test = CALLIMP (imp1, c, alloc);
CALLIMP (imp2, test, init);
CALLIMP (imp4, test, plop);
CALLIMP (imp3, test, release);
}