However I'm not sure human brains do 3D vision, nor 3D vision+inside matter estimation (not just shells). so maybe we can get way with 2D!
David Marr believes brains *do* perform 3D vision, including 3D representation and manipulation of 3D objects:
(p. 326)
Unfortunately, we receive little help from the biological sciences about
the kinds of questions raised by the later aspects of the visual process.
Virtually nothing is known about the physiological and anatomical arrange-
ments that mediate the construction of the three-dimensional visual descrip-
tions of the world, and even the best psychological information is for the
most part anecdotal and derived from the neurological rather than psycho-
physical studies.
Nevertheless, I think it is clear in principle that the brain must con-
truct three-dimensional representations of objects and of the space they
occupy. As Sutherland (1979) has remarked, there are at least two good
reasons for this. First, in order to manipulate objects and avoid bumping
into them, organisms must be able to perceive and represent the disposi-
tion of the objects' surfaces in space. This gives us a minimal requirement
for something like the 2 1/2-D sketch. Second, in order to recognize an
object by its shape, allowing one then to evaluate its significance for action,
some kind of three-dimensional representation must be built from the
image and matched with which other knowledge is already associated. As we have seen,
the two processes of construction and matching cannot be rigorously sep-
arated because a natural aspect of constructing a three-dimensional rep-
resentation may include the continual consultation of an increasingly spe-
cific catalogue of stored shapes.
This forces us to rely, in our study of these later problems, much more
on a careful consideration of the computational and representational
requirements. Stated baldly, the strong constraints come from what the
representation is to be used for.
We asked here about the requirements for a shape representation to
be used for recognition, and we came to three main conclusions: A shape
representation for recognition should (1) use an object-centered coordi-
nate system, (2) include volumetric primitives of various sizes, and (3) have
a modular organization. A representation based on a shape's natural axes
(for example, the axes identified by a stick figure) follows directly from
these choices. In addition, we saw that the basic process for deriving a
shape description in such a representation must involve a means for iden-
tifying the natural axes of a shape in its image and a mechanism for trans-
forming viewer-centered axis specifications to specifications in an object-
centered coordinate system.
Marr, David. 1982.
Vision: A Computational Investigation into the Human Representation and Processing of Visual Information. Cambridge, Massachusetts: The MIT Press.