So good article. I read your others too, about 4 of 5, the last at least had a few WIP but I still gleaned enough from them and am done reading them. I'll go through the key points you provided in my own words. Not my best writing but it will have to do:
You say the AI responds about what else its son likes, but fails to respond it has NO SON! Hmm, if it is asked to predict text in wikipedia articles, it should rarely say it has no son, it's modeling other features like Jessica or Wood. All it has to do is turn on the node feature "me" and will select the right completion ^_^. Tom has son, Marry nope, Jane has baby girl.
You say our AI finds patterns, text to intention, not intention to text, it has bias that nurses are females and can be racist. It's saying only what, we wrote. Well, so much is wrong here, erm... Females are usually nurses. There CAN be male nurses. There is too. Next, GPT-2 etc already can generate new data from the distribution, not just what humans made. They make real discoveries, it may never seen "cats eat" but only "dogs eat", and can utter "cats eat". Next, humans do, find patterns... Next, you've separated intention and text as if they are different, but the brain is all just data be it text or images and some features re-occur which causes patterns, so your questions you wake up with every morning are text/sounds or images and they are the intent, the text features have Reward Dialog Goal on them and force it to talk about certain subject like AI or money or food, and root goal is food mostly, and is unchangeable reward too mostly.
You say "children who are just learning to speak will say %u201Cfood!%u201D when they are hungry, rather than %u201CI want food%u201D. A child doesn%u2019t need to develop a concept of %u201Cself%u201D before he or she understands the meaning of hunger.". Yes, the next likely word to predict is Food, indeed. Later it will leave this far away and talk about ex. AI, a much more distant thing, but definite way to get food forever :P. Now why doesn't it say Give me food? Or give my friend, not me, food? There must be some reward, or frequency behind the choice of predicting the next likeliest word or letter after the context given it has.
You say rewards, meaning, are key too. Yes I said those above, we say/predict the next word by entail frequency and rewarding node features like food or foods, etc.
You say "memories are only formed when they solve your immediate problem. They must answer a question or solve a problem." Nope. Collecting data trains the model about frequency, semantics, etc. You always store it, you only forget connections if not activated enough. Rewardful answers you expected or wanted to see occur with aligning reason do make it more likely to remain stored.
You say basically it tries randomly answering questions until gets the desired answer. Yes, it's collecting data from an updated/specialized-er source or website/ distribution/ motor trial tweaking. You say it may use again in future the "help" word if it causes the answer to appear next in close time. Yes, help, genie, God, all may do the trick. Maybe dad or mom will do all you need.
You made me consider an example 'i threw the _ at the', if it sees many different words where the blank is, it can match all sorts of matches confidently even if it's missing that word? That word no matter than? Can be anything for sure? Perhaps.
You say we use already existing features in data. Yes. I show this in my work. Hierarchy or semantic heterarchy web. And Byte Pair Encoding.
You say you can see/hear a new object or question and not know a satisfying answer confidently/ predictable/ recognizable, and if told the answer only satisfied if ex. trust them, and if they say certain matching words. Also if told this object of gears and rope is a food cooker over fire, you'll remember it if familiar with food cooking and clock precision features to connect it to known templates in memory to boost/store it more likely.
Do note that predicting the next word or recognizing first your or other's context are both the same thing, you recognize by frequency and related words and related positions of words convolutionally.
Lastly, you showed us something like le-nun-no-dor-e-vi-ncie and made us recognize who this person is but that is just similar matching, like typos... :P