WaitForPresent in Elk: "Parallelizing the physics solver by ..."

Parallelizing the physics solver by Dennis Gustafson (https://youtu.be/Kvsvd67XUKw?si=moTxJqvke4g6835s) was a really cool talk. Always cool to see how people tackle an optimization Problem. The only thing I fundamentally disagree with is the use of parallel_for. This might look really innocent but will starve the CPU after each iteration and while not a lot of time seems to be lost per iteration you also have ramp-up/down costs of the threadpool. Used without care this ends up in a death by a thousand cuts scenario.

YouTube

Dennis Gustafsson – Parallelizing the physics solver – BSC 2025

July 21, 2025 at 7:17:14 AM

It is very convenient to write code like this I agree, but it will prevent you from reaching peak performance. Someone in the audience actually correctly points out that you can overlap the second half of the first iteration of the solver with the first half of the second iteration, and this is not really possible with this setup (at least I have not yet seen an implementation of parallel_for that would allow this kind of overlap)

It's kind of unfortunate how huge the complexity jump from a simple parallel-for to dependency driven task scheduling is. This is an area where I really think the more complex "ECS" style approaches can potentially make big improvements practically. Doing it all by hand works, but it's super fragile and eats a lot of programmer time.

Yeah I agree. It's the same in rendering, getting rid of a couple of barriers sometimes requires more holistic knowledge of the frame ( or at least until then mostly independent systems ) and that is quite a pain without some general structure like a render graph for instance :/

Replying to someone

Yeah, that isn't just a limitation of parallel_for_each but of Cilk-style nested task parallelism more generally: it can only represent series-parallel task graphs (https://en.wikipedia.org/wiki/Series%E2%80%93parallel_graph). That said, you can support both types of API with the same task graph backend, at least if you support dynamic tasks.