If you compare the performance of async rust vs. rust with blocking syscalls there’s not even a comparison. ‘epoll’ and the like are vastly more performant than blocking system io, async then it simply a way to make that kind of system interface nice to program with as you can ignore all that yielding and waking up and write straight-line code.
Now, if all you do is read a config file, yes, all that is absolutely overkill. If you’re actually doing serious io though there’s no way around this kind of stuff.
I assume by performance you mean CPU usage per io request. Each io call should require a switch to the kernel and back. When you do blocking io the switch back is delayed(switch to other threads while waiting), but not more taxing. How could it be possible for there to be a difference?
Because the kernel doesn’t like you spawning 100k threads. Your RAM doesn’t, either. Even all the stacks aside, the kernel needs to record everything in data structures which now are bigger and need longer to traverse. Each thread is a process which could e.g. be sent a signal, requiring keeping stuff around that rust definitely doesn’t keep around (async functions get compiled to tight state machines).
Specifically with io_uring: You can fire off quite a number of requests, not incurring a context switch (kernel and process share a ring buffer) and later on check on the completion status quite a number, also without having to context switch. If you’re (exceedingly) lucky no io_uring call ever cause a context switch as the kernel will work on that queue on another cpu. The whole thing is memory, not CPU, bound.
Anyhow, your mode of inquiry is fundamentally wrong in the first place: It doesn’t matter whether you can explain why exactly async is faster (I probably did a horrible job and got stuff wrong), what matters is that benchmarks blow blocking io out of the water. That’s the ground truth. As programmers, as a first approximation, or ideas and models of how things work are generally completely wrong.
Because the kernel doesn’t like you spawning 100k threads.
Why do you say this?
Your RAM doesn’t, either
Not if your stacks per thread are small.
Even all the stacks aside, the kernel needs to record everything in data structures which now are bigger and need longer to traverse.
These data structures must exist either in userland or the kernel. Moving them to the kernel won’t help anything. Also, many of these data structures scale at log(n). Splitting have the elements to userland and keeping the other half gives you two structures with log(n/2) so 2log(n/2) = log(n^2/4). Clearly that’s worse.
Each thread is a process which could e.g. be sent a signal, requiring keeping stuff around that rust definitely doesn’t keep around (async functions get compiled to tight state machines).
If signals were the reason async worked better, then the correct solution is to enable threads that opt-out of signals. Anything that slows down threads that isn’t present in an async design should be opt-out-able. The state-machines that async compiles to, do not appear inherently superior to multiple less stateful threads managed by a fast scheduler.
Specifically with io_uring: You can fire off quite a number of requests, not incurring a context switch …
As described here you would still need to do a switch to kernel mode and back for the syscalls. The extra work required from assuming processes are hostile to each other should be easy to avoid among threads known to have a common process as they are obviously not hostile to each other and share memory space anyway.
The synchronization required to handle multiple tasks should be the same regardless if they are being run on the same thread by a user land scheduler or if they are running on multiple threads with an os scheduler.
Anyhow, your mode of inquiry is fundamentally wrong in the first place: …
I’m not interested in saying that async is the best because it appears to work well currently. That’s not the right way to decide the future of how to do things. That’s just a statement of how things are. I agree, if your only goal is get the fastest thing now with no critical thought, then it does appear that async is faster. I am unconvinced it must fundamentally be the case.
Page size is 4k it doesn’t get smaller. The kernel can’t give out memory in more fine-grained amounts, at least not without requiring a syscall on every access which would be prohibitively expensive.
Anything that slows down threads that isn’t present in an async design should be opt-out-able.
That’s what async does. It opts out of all the things, including having to do context switches when doing IO.
As described here you would still need to do a switch to kernel mode and back for the syscalls.
No, you don’t: You can poll the data structure and the kernel can poll the data structure. No syscalls required. Kernel can do it on one core, the application on another, so in the extreme you don’t even need to invoke the scheduler.
I am unconvinced it must fundamentally be the case.
You can e.g. have a look at whether you can change the hardware to allow for arbitrarily small page sizes. The reaction of hardware designers will be first “are you crazy”, then, upon explaining your issue, they’ll tell you “well then just use async what’s the problem”.
The only way I have heard threads are expensive, in the context of handling many io requests, is stack usage. You can tell the os to give less memory (statically determined stack size) to the thread when it’s spawned, so this is not a fundamental issue to threads.
Go ahead and spin up a web worker and transfer a bunch of data to it and tell us how long you had to wait.
Time to transfer data to one thread is related to io speed. Why would this have anything to do with concurrency model?
Well I just told you another one, one actually relevant to the conversation at hand, since it’s the only one you can use with JavaScript in the context of a web browser.
No, because async is fundamentally a paradigm for how to express asynchronous programming, i.e. situations where you need to wait for something else to happen, threading is not an alternative to that, callbacks are.
The “do something while waiting for something else” is not a reason to use async. That’s why blocking system calls and threads exist.
Threads don’t need to be expensive. Max stack usage can be determined statically before choosing the size when spawning a thread.
Any other reasons?
If you compare the performance of async rust vs. rust with blocking syscalls there’s not even a comparison. ‘epoll’ and the like are vastly more performant than blocking system io, async then it simply a way to make that kind of system interface nice to program with as you can ignore all that yielding and waking up and write straight-line code.
Now, if all you do is read a config file, yes, all that is absolutely overkill. If you’re actually doing serious io though there’s no way around this kind of stuff.
I assume by performance you mean CPU usage per io request. Each io call should require a switch to the kernel and back. When you do blocking io the switch back is delayed(switch to other threads while waiting), but not more taxing. How could it be possible for there to be a difference?
Because the kernel doesn’t like you spawning 100k threads. Your RAM doesn’t, either. Even all the stacks aside, the kernel needs to record everything in data structures which now are bigger and need longer to traverse. Each thread is a process which could e.g. be sent a signal, requiring keeping stuff around that rust definitely doesn’t keep around (async functions get compiled to tight state machines).
Specifically with io_uring: You can fire off quite a number of requests, not incurring a context switch (kernel and process share a ring buffer) and later on check on the completion status quite a number, also without having to context switch. If you’re (exceedingly) lucky no io_uring call ever cause a context switch as the kernel will work on that queue on another cpu. The whole thing is memory, not CPU, bound.
Anyhow, your mode of inquiry is fundamentally wrong in the first place: It doesn’t matter whether you can explain why exactly async is faster (I probably did a horrible job and got stuff wrong), what matters is that benchmarks blow blocking io out of the water. That’s the ground truth. As programmers, as a first approximation, or ideas and models of how things work are generally completely wrong.
Why do you say this?
Not if your stacks per thread are small.
These data structures must exist either in userland or the kernel. Moving them to the kernel won’t help anything. Also, many of these data structures scale at log(n). Splitting have the elements to userland and keeping the other half gives you two structures with log(n/2) so 2log(n/2) = log(n^2/4). Clearly that’s worse.
If signals were the reason async worked better, then the correct solution is to enable threads that opt-out of signals. Anything that slows down threads that isn’t present in an async design should be opt-out-able. The state-machines that async compiles to, do not appear inherently superior to multiple less stateful threads managed by a fast scheduler.
As described here you would still need to do a switch to kernel mode and back for the syscalls. The extra work required from assuming processes are hostile to each other should be easy to avoid among threads known to have a common process as they are obviously not hostile to each other and share memory space anyway. The synchronization required to handle multiple tasks should be the same regardless if they are being run on the same thread by a user land scheduler or if they are running on multiple threads with an os scheduler.
I’m not interested in saying that async is the best because it appears to work well currently. That’s not the right way to decide the future of how to do things. That’s just a statement of how things are. I agree, if your only goal is get the fastest thing now with no critical thought, then it does appear that async is faster. I am unconvinced it must fundamentally be the case.
Have you tried?
Page size is 4k it doesn’t get smaller. The kernel can’t give out memory in more fine-grained amounts, at least not without requiring a syscall on every access which would be prohibitively expensive.
That’s what async does. It opts out of all the things, including having to do context switches when doing IO.
No, you don’t: You can poll the data structure and the kernel can poll the data structure. No syscalls required. Kernel can do it on one core, the application on another, so in the extreme you don’t even need to invoke the scheduler.
You can e.g. have a look at whether you can change the hardware to allow for arbitrarily small page sizes. The reaction of hardware designers will be first “are you crazy”, then, upon explaining your issue, they’ll tell you “well then just use async what’s the problem”.
Well too bad cause they are.
Go ahead and spin up a web worker and transfer a bunch of data to it and tell us how long you had to wait.
The only way I have heard threads are expensive, in the context of handling many io requests, is stack usage. You can tell the os to give less memory (statically determined stack size) to the thread when it’s spawned, so this is not a fundamental issue to threads.
Time to transfer data to one thread is related to io speed. Why would this have anything to do with concurrency model?
Well I just told you another one, one actually relevant to the conversation at hand, since it’s the only one you can use with JavaScript in the context of a web browser.
You cant say async is the fundamentally better model because threading is purposely crippled in the browser.
The conversation at hand is not “how do io in browser”. Its “async is not inherently better than threads”
No, because async is fundamentally a paradigm for how to express asynchronous programming, i.e. situations where you need to wait for something else to happen, threading is not an alternative to that, callbacks are.
Threads are callbacks.