liamc
liamc•10mo ago

Micro-optimizations to allow more dynamic objects

When profiling my game, normcore's Realtime.Update is consistently the most expensive part of my update loop. Even in a best case scenario, when idling in my game, when most objects are not moving, and we're not spawning any synced objects.
The most expensive method when idling is RealtimeModel.OnWillWrite. Within that, the most expensive OnWillWrite is RealtimeTransformModel, and within that RealtimeTransformRigidbody.OnLocalModelWillWrite is most expensive. A quick look at a deep profile sample shows some micro-optimisations which could shave a signficant percentage of the CPU cost of updating RealtimeTransforms.
From the call counts in this screenshot, you can see there are: 190 RealtimeTransforms 62 of those are RealtimeTransformRigidbodies 60 are syncing position, but none of them have moved (there's no model.set_position call) 62 are syncing rotation, but only two of them have rotated since the last update Some micro-optimisations i can see at a glance: The most expensive method is get_roomTime and the most expensive nested method get_realtime() is called twice instead of re-using the same realtime reference. Depending on the precision required, roomtime could be cached to a static field once by the first RT and shared by all RealtimeTransformRigidbody instances. In SetSafePosition and SetSafeRotation the models value is only set if the value is not equal and the new value is not NaN. In both cases, IsNaN is more expensive than IsEqual. Since conditions in if statements are evaluated left-to-right, changing the SetSafePosition if statement from "IsNaN or IsEqual" to "IsEqual or IsNaN" would allow non-moving RTs to exit without executing the more expensive IsNaN method.
No description
18 Replies
Camobiwon
Camobiwon•10mo ago
Yess absolutely. Any optimizations are a total win and especially across the board ones. To add and since it was discussed earlier, but putting a decent focus into optimizing isOwnedLocallySelf and isOwnedRemotelySelf as these do an external plugin call and are expensive!
liamc
liamcOP•10mo ago
(hit the character limit in the OP) Every RealtimeTransform.OnWillWrite checked isOwnedLocallySelf, which gets somewhat expensive in large quantities. Its responsible for 0.20ms out of the 1.83ms RealtimeTransform.OnWillWrite in this example. If the ownerID was cached when ownership changes, you could remove the GetOwnerIdSelf call, and the cost would be halved. Or almost all of the cost could be removed by caching a local boolean for isOwnedLocallySelf.
No description
Mechabit
Mechabit•10mo ago
some pretty simple changes you mention too
liamc
liamcOP•10mo ago
yeah, they should be very safe.
Camobiwon
Camobiwon•10mo ago
AFAIK I think Max was straight up unaware of owned locally / remotely self doing an extern plugin call so it seems like it may have not even been intentional, and ideally easy to fix
Mechabit
Mechabit•10mo ago
you should work for normcore just so you can implement these changes lol
liamc
liamcOP•10mo ago
😆 I would have applied for that recent job if i wasnt so happy where i am!
Camobiwon
Camobiwon•10mo ago
#Unity Developer @ Normal ;P Would be fun to work at Normal and get fixes in for all of us but I'm full time / happy at my other place too lol Maybe if their schedule is flexible / part time I'll consider it haha
Mechabit
Mechabit•10mo ago
I would do it on contract basis lol
Camobiwon
Camobiwon•10mo ago
yeah fs
liamc
liamcOP•10mo ago
One more significant cost is the per-instance dictionary lookup in SafeInvokeCallback. The callback lists are cached in a dictionary with a RealtimeModelEvent enum as the key, then OnWillWrite, OnDidWrite, OnWillRead and OnDidRead all use the same SafeInvokeCallback method, passing in the event type as an argument. This seems to exist to keep the normcore code tidy internally, but the dictionary lookup contributes 0.19ms out of 1.83ms in this example. The lookup could be removed and replaced with separate (more verbose) methods. E.g, OnWillWrite would call SafeInvokeOnWillWrite instead of SafeInvokeCallback(RealtimeModelEvent.OnWillWrite).
No description
liamc
liamcOP•10mo ago
Those few optimisations combined, in this specific profiler frame, would shave >0.50ms out of 1.83ms. Thats a ~28% reduction
Camobiwon
Camobiwon•10mo ago
That would be sweet if most or all of these could be implemented and ideally wouldn't even be too challenging. Some things like the model ownership properties and a couple of these would also be baseline across-the-board speedups so any scripts relying on it would benefit, which would be awesome I don't even know if these are so much "micro-optimizations" as the title implies if it could shave upwards of 0.5ms off haha, in my mind micro-optimizations are on the scale of single nano or maybe microseconds
liamc
liamcOP•10mo ago
Well, for a single instance it'd be a wasteful micro optimisation 😛 But micro-optimizations become very worthwhile when there's dozens-hundreds of instances. Yeah, ownerIdSelf is also used in RealtimeTransform's FixedUpdate. So caching it would automatically provide savings there. But, if it was cached for all RealtimeModels by default it would provide tons of savings.
maxweisel
maxweisel•10mo ago
we’ve got all of this on the list 🙂 interest management may take awhile but optimizations around the clientID checks will come much sooner same for the event dispatch
cryptomax
cryptomax•10mo ago
1000% The more RTs we can handle the better! Were we not able to use your transform optimizer after the version of normcore with self compiled models?
liamc
liamcOP•10mo ago
Awesome, cheers Max! Should be fine! Let me know if you get any errors. I'm using it with 2.6.1 which is after self compiled models I checked the diffs between the original one and 2.6.1 at the time and saw no changes that should affect it. I haven't checked against 2.9.2 yet though
cryptomax
cryptomax•10mo ago
just upgraded to 2.7.1 needed for a bug fix, havent' seen any particular issues so far