Local variable vs. switch vs. if/else

Scripted ALFA systems & related tech discussions (ACR)

Moderators: ALFA Administrators, Staff - Technical

Locked
Ronan
Dungeon Master
Posts: 4611
Joined: Sun Feb 20, 2005 9:48 am

Local variable vs. switch vs. if/else

Post by Ronan »

I wanted to do some testing on the speed of nwscript when it came to looking up values from tables. I was wondering, for example, if it would be fast to define spawn groups the way NESS does it (A large group of if/elses comparing a string variable to a constant, inside of a loop no less!) or something akin to a GetLocalString followed by an ExecuteScript. Then there are the deity scripts and weapon/armor/shield proficiency scripts, all of which require looking up re-defined data.

So, I made some test scripts with 100 possibilities (100 if/else, 100 cases, 100 local ints stored). I tested the local calls with variable names of sizes 2, 8, 16, 32 and 64. Each local variable script had its own object with 100 pieces of data stored on it, and one script stored and retrieved data using GetModule (which used length 2 variable names as well).

My theory was the local variable calls of lengths 2 and 8 would be the fastest, followed by the case, then then the other local calls from smallest to largest, then the if/else. String comparisons are anything but effecient, but I thought modern CISC processors would have intructions specifically for comparing strings. Also, I thought the much smaller size of a single GetLocal call vs. a huge if/else or switch would mean much more code would reside on the processor's cache.

Here are the results without any scripts cached in module properties:

Code: Select all

test_2molocal_100       290 msec   1001 calls
test_2local_100	       215 msec   1001 calls
test_8local_100	       229 msec   1001 calls
test_16local_100       248 msec   1001 calls
test_32local_100       290 msec   1001 calls
test_64local_100       364 msec   1001 calls
test_switch_100	       995 msec   1001 calls
test_ifelse_100	      1110 msec   1001 calls
Here are the results with everything cached:

Code: Select all

test_2molocal_100       282 msec   1001 calls
test_2local_100	       212 msec   1001 calls
test_8local_100	       227 msec   1001 calls
test_16local_100       247 msec   1001 calls
test_32local_100       281 msec   1001 calls
test_64local_100       355 msec   1001 calls
test_switch_100	       973 msec   1001 calls
test_ifelse_100	      1089 msec   1001 calls
Each "call" is 100 retrievals of a piece of data in the middle of the dataset (ie, 50 in this case), and was run every .1 seconds from a DelayCommand. Each retrieval was behind a function, so assuming the compiler isn't smart enough to in-line, thats a performance hit right there. Especially for the local variables, which I passed strings to so as to not incur any string formating (IntToString()) overhead. The if/else and switch I passed integers to.

Also, we can see how GetModule() slowed things down a bit, as the only difference between test_molocal_100 and test_local_100 is a global object is substituted for GetModule().

Well, I was right about the lesser local variables being fast at least. I didn't expect the case and if/else to get their ass kicked quite as badly as they did...

So, I think I've reached a few conclusions:
-NWscript doesn't compile switch statements well at all.
-Local variables are fast.
-The length of local variable names isn't important unless your doing an obscene number of calls.
-NWscript is not as slow as I thought. After 8008 calls .1 seconds apart (100 seconds), it only used up 3.646 seconds of system time.
Ronan
Dungeon Master
Posts: 4611
Joined: Sun Feb 20, 2005 9:48 am

Post by Ronan »

Hmm, just ran the same test and added some scripts to loop through all the objects in an area (55 total) and got the following results,

Code: Select all

test_getobjs_fun       344 msec   1001 calls
test_getobjs_cch       324 msec   1001 calls
test_getobjs	       324 msec   1001 calls
test_getobjs_fun uses GetIsObjectValid() to terminate the while loop, while the others use != OBJECT_INVALID. The only difference between test_getobjs_cch and test_getobjs is that the former is cached.

This might look pretty good, except each call to the getobj family of tests only executed 10 times (as opposed to 100 times for the other scripts). It wouldn't event run at 100. Not very fast at all, this makes me wonder if we might want to consider a completely custom (and local-variable based) faction system seperate from NWN's, with more than one axis (ie, maybe a scale for both like/dislike and fear).
User avatar
ç i p h é r
Retired
Posts: 2904
Joined: Fri Oct 21, 2005 4:12 pm
Location: US Central (GMT - 6)

Post by ç i p h é r »

Can you actually post your code too? It'd be easier to understand the comparisons you've made.

I think the last test case proves the compiler isn't optimizing as the additional function call in test_getobjs_fun added some overhead, unless GetIsObjectValid() does more than a != OBJECT_INVALID test. Also, I think the command line version had an optimization switch, which if true, might be interesting to test out while you're doing this.

Thanks.
Ronan
Dungeon Master
Posts: 4611
Joined: Sun Feb 20, 2005 9:48 am

Post by Ronan »

Alright, its rather long though...

OnLoad:

Code: Select all

#include "test_inc"

int i;

void RunTest() {
    ExecuteScript("test_ifelse_100", OBJECT_SELF);
    ExecuteScript("test_switch_100", OBJECT_SELF);
    ExecuteScript("test_molocal_100", OBJECT_SELF);
    ExecuteScript("test_local_100", OBJECT_SELF);
    ExecuteScript("test_8local_100", OBJECT_SELF);
    ExecuteScript("test_16local_100", OBJECT_SELF);
    ExecuteScript("test_32local_100", OBJECT_SELF);
    ExecuteScript("test_64local_100", OBJECT_SELF);

    ExecuteScript("test_getobjs", OBJECT_SELF);
    ExecuteScript("test_getobjs_cch", OBJECT_SELF);
    ExecuteScript("test_getobjs_fun", OBJECT_SELF);
    ExecuteScript("test_getobj_find", OBJECT_SELF);
    
    if(i<1000) {
        i++;
        DelayCommand(0.1, RunTest());
        PrintString(IntToString(i));
    }
}

void main() {
    oLocalMod = GetModule();
    oLocal100 = GetObjectByTag("oLocal100");
    o8Local100 = GetObjectByTag("o8Local100");
    o16Local100 = GetObjectByTag("o16Local100");
    o32Local100 = GetObjectByTag("o32Local100");
    o64Local100 = GetObjectByTag("o64Local100");
    loc = GetLocation(GetObjectByTag("Waypoint"));

    int i=0;
    for(i=0; i<100; i++) {
        SetLocalInt(oLocalMod, IntToString(i), 9 - (i % 10));
        SetLocalInt(oLocal100, IntToString(i), 9 - (i % 10));
        SetLocalInt(o8Local100, "123456" + IntToString(i), 9 - (i % 10));
        SetLocalInt(o16Local100, "12345678901234" + IntToString(i), 9 - (i % 10));
        SetLocalInt(o32Local100, "123456789012345678901234567890" + IntToString(i), 9 - (i % 10));
        SetLocalInt(o64Local100, "12345678901234567890123456789012345678901234567890123456789012" + IntToString(i), 9 - (i % 10));
    }
    i=0;
    RunTest();
}
test_getobj_find uses GetNearestObjectByTag(), and it is sloooooow. I should have expected that I guess, given that each call probably loops through ever object in an area.

test_ifelse_100

Code: Select all

int Test(int nNum) {
    if(nNum == 0)
        return 9;
    else if(nNum == 1)
        return 8;
    else if(nNum == 2)
        return 7;
    else if(nNum == 3)
        return 6;
.
.
.
    else if(nNum == 96)
        return 3;
    else if(nNum == 97)
        return 2;
    else if(nNum == 98)
        return 1;
    else return 0;
}

void main() {
Test(50);
Test(50);
Test(50);
Test(50);
.
.  92 more...
.
Test(50);
Test(50);
Test(50);
Test(50);
test_switch_100:

Code: Select all

int Test(int nNum)
{
switch(nNum) {
    case 0:
    return 9;
    case 1:
    return 8;
    case 2:
    return 7;
    case 3:
.
.
.
    case 96:
    return 3;
    case 97:
    return 2;
    case 98:
    return 1;
}
return 0;

void main() {
Test(50);
Test(50);
Test(50);
Test(50);
.
.  92 more...
.
Test(50);
Test(50);
Test(50);
Test(50);
test_8local_100:

Code: Select all

int Test(string sVarName) {
    return(GetLocalInt(o8Local100, sVarName));
}

void main() {
Test("12345650");
Test("12345650");
Test("12345650");
Test("12345650");
.
.  92 more...
.
Test("12345650");
Test("12345650");
Test("12345650");
Test("12345650");
Then of course the other test_*locals use different size strings.

test_getobjs:

Code: Select all

int Test(string sVarName) {
    object oArea = GetObjectByTag("Area001");
    object obj = GetFirstObjectInArea(oArea);

    while(obj != OBJECT_INVALID) {
        obj = GetNextObjectInArea(oArea);
    }
    return 50;
}

void main() {
Test("50");
Test("50");
Test("50");
Test("50");
Test("50");
Test("50");
Test("50");
Test("50");
Test("50");
Test("50");
}
Supposedly GetIsObjectValid() does test for more, but I'd imaine its first line is something like if( oObject == OBJECT_INVALID) return;
User avatar
ç i p h é r
Retired
Posts: 2904
Joined: Fri Oct 21, 2005 4:12 pm
Location: US Central (GMT - 6)

Post by ç i p h é r »

The GetObject looping efficiency isn't really that surprising to me as I recall large scale faction reputation adjustments being inadvisable (and noted as such on the lexicon I think) due to this very thing. Still, we can't get away from it entirely, but we just have to be sure it's necessary. If we cut out all the extraneous inventory looping that we have throughout ACR, we'll eliminate a lot of overhead.
Ronan
Dungeon Master
Posts: 4611
Joined: Sun Feb 20, 2005 9:48 am

Post by Ronan »

ç i p h é r wrote:If we cut out all the extraneous inventory looping that we have throughout ACR, we'll eliminate a lot of overhead.
Yeah, as far as I'm concerned each event gets one loop of each type (inventory, effects, whatever) maximum. Any systems that need to loop through something does so in their event's main code, not themselves. If anyone adds any extra loops to this, I'll beat them over the head with a stick :P

I'm happy that it looks like the global variable storage objects I use will be quite fast, instead of retrieving them with a GetObjectByTag() call.
Locked