So I've went through like 10 books this week and did lots of searching and thinking, and am sure about something now and can ask it in a more compact way now:
Mine:
Why don't we use a natural alternative to Backpropagation? Backprop is very complex and unnatural, even Hinton and others say so. Doesn't the brain learn functions by thinking about known ideas? If I don't know the answers to x^2= or 2^x= (put 1, 2, 3, or 4 where the x is) and only have some of the answers, I can realize the algorithm that generates the other answers, by thinking "hmm, let me try 2*2, it =4, nope, not matching observations, maybe it times itself? Repeat. (I'm searching likely answers and may get it right soon)". My alternative (if works) can not only explain how it discovered the function/algorithm behind the observed data but also why it generates answers and unseen answers.
Backprop:
From what I've understood, Backprop learns by taking data and then tweaking the last output layer weights, then the earlier layer weights, backwards, so that the net outputs more accurate answers when prompted. It isn't just tweaking the net until it finds a good brain, it uses (and needs to) the data so that it can find patterns (functions) that generate the observed data. Backprop requires the network to be narrow in the middle and in a hierarchical form so it can learn compressed representations (patterns). The world and Google has built on this way, Backprop is used in like 98% of AI, to make "Backprop" work better (lol...) they gave the net RNN, then residuals, LSTM gates for the gradient vanishing problem, eventually falling back to feedforward in Transformer architectures and using instead positional encoding. And gave Backprop semantic embeds, etc, because Backprop is not running the whole show you see, we need to purposely make the network narrow, really? And once you train it you can't make it bigger unless start over fresh, you can only lower the learning rate until converges. You can't understand anything in the network. The world has built on a super complex math-filled idea adding band-aids to make it work better, all because it "worked better". Adding to this design is hard because it is not a good base to build on... The AI field was at Markov Chains and Hidden Markov Models but then went to RNNs etc.
How does Backprop work?:
As for how Backprop finds functions during its network tweaking, I still don't know how, explain fully in clear English so there's no way of misunderstanding if you can with no math nor 5 pages of text. To me it seems to use the data and rules given (semantics, narrow-ized hierarchy, data) to create new rules to get more out of data (as said 1/2/3/4^2=?), I'm not sure how it works or how good it works but I'm confident there no such thing as a free non-brute-force approach, it is not doing enough to find functions, it only has 1 rule which it to lower cost, and if it IS using semantics etc and data to find the way to tweak the net, then it is doing something more humanly as I suggested. I can't see how you could not look at the observed data and look at related ideas that generate them to do the finding of the function that made the observed data.