Hi elpidiovaldez5, very interesting topic!
I really don't think that abstraction is fundamentally a reduction in the detail of a representation. I see it rather like an orthogonal data view. Let me explain my opinion.
Whatever you do with the "pixel data", you're still in a flat representation, even after classifying and representing the situation differently, in a simmple way: three switches and a lever. Actually, you're still working with the skin of the world, when you would like to access its flesh.
For example, give it a clock. Soon, ML can predict its behavior but without understanding why the minute hand is rotating. It's a black box problem.
The problem is: given a set of observed facts, how can we imagine the causes of these facts, without ever knowing them. These supposed causes are a theory. A theory doesn't have to be proven, it just needs to provide plausible explanation while keeping simple enough.
EDIT:
Imagine a box containing a simple mechanism, like the one in the picture below.
If you move one of the green sticks, the other one moves in the opposite direction. A machine can start learning this mechanism while the box is open. Then if you show it a closed box with 2 green sticks, the machine can reconstruct the whole when given only a part, which means it can recognize the two-sticks mechanism, or, more precisely, believe it is the same mechanism it encountered before. At this point, for the machine, the content of the closed box, and what happens when one stick is moved, are a theory.
At this point, we don't even need to care about what's inside the box anymore, and that's precisely what "abstracting" means. It means removing the mechanism from the equation, replacing it with a symbol (a new "pixel"). It means, "hey, I know this: it's a two sticks box!"
When a robot is facing switches and levers, if it already used other switches and levers in the past, then it knows how to interact with them, and it can guess that some of them are connected to mechanisms similar to other mechanisms it already encountered.
I think that's how it goes. We deal with the unseen. We name the unseen.
In other words, I believe that changing weights will never be enough. New "fake inputs", so to speak, need to be created to reflect new concepts. When a group of nodes tend to work together, new nodes must be created to symbolize the entire group. Let's call them spoon-nodes. When the group is working, the spoon-nodes tend to be active (when you see a spoon you can say the word "spoon"). When the spoon-nodes are activated, the group gets to work (when I say the word "spoon", you can "see" a spoon).