Welcome to my nightmare...
Microsoft's speech-to-text software of yesteryear was trash but the newer version is on par with Cortona and is excellent! Problem, it only works with UWP. UWP is a bastardized .net Com wrapper nightmare. Because UWP is based on com interfaces that were the thing way back when and still are for hardcore C++ programmers they became utterly useless when literally 15 years of reflection has been implemented with .NET.
In any case, there's still a problem with voice commands and it just gets frustrating when you have to prefix every command with "Cortona", or "Computer" or whatever name you want to call your agent it becomes mind-numbing after just several utterances. Also, I can type faster than I can talk and when you master an app you learn its shortcuts, and the keyboard screams much faster than using a mouse.
Now have you noticed that when you get that aha moment, you have this notion of understanding but nothing has been translated into words yet. Yet when you need to translate that notion or idea it's almost instant as to how it turns into a well grammatically formatted sentence. Literally very low latency between thought and serialization into words. So, I'm thinking that Musk is on a streek and a mind to machine interface is much better than some smart agent trying to guess your intention, which is hard even for human beings. But a mind-machine interface means thoughts can get translated into sequences of actions, code, and paragraphs if not pages of words. I mean just thinking of ideas and seeing them appear on your word doc into well-formatted grammatically correct sentences will make you soo much more productive and the feedback or impressions you get back from this kind of interface is immediate. Also, think how more productive you can be with images, where you're just thinking about what something should look like and a 3D model just appears where you can manipulate, and rotate it.
The better interface is the one that makes you much more productive, if not addicted to working with it, where the latency is milliseconds to see your thoughts turn into results...