Another way to put it simply is I want it to learn by lots of text (or some universal verb/object/predicate structure) that "are is" is BAD and "is soft" is GOOD, i.e. it outputs back out your sentence like this "Another thing they are is soft like cats.".
I have not heard anyone mention/reply back on this subject !
There are grammarc natural language rules rules that say whether a sentence is well formed. A part of such grammar rules may be written in
BNF notation, but if you want a real deal you'll need a
link grammar notation. There are a lot of programs that use either a kind of BNF or link grammar to parse texts, but you have to provide them a valid grammar. That grammar could be anything, from grammar for parsing math expressions, over grammars for parsing programming languages, to grammars for parsing natural language sentences.
Some parsers like Stanford Parser come bundeled with a natural language grammar (see
references from Don Patrick), but generally, natural language parsing problem stays unsolved problem due to ambiguities that raise by overlapping natural language grammar rules. An example of this is:
Last night I shoot an elephant in my pajamas.
The question here is whether the elephant wears a pajama or the shooter wears pajama. However, there are hints to solve such problems, like `Last night` and `my pajams` parts of the sentence that hint that the shooter wore a pajama, but this is still not a guarantee. Ambiguity resolving could also be done by comparing near sentences from the same paragraph, or by statistical analysis, but it still remains open question in linguistics, which is hoped to be solved when AI would be invented. There are even some contests (
like winograd schema challenge) in which competitors (their programs in fact) try to resolve as much ambiguities as they can. I think no one ever made it with 100% correctness. Questions posed in such competitions are like:
A cat didn't enter a box because it was too big. What was big, a cat or a box?
This is a big problem, to answer this question, isn't it? Well, it gives programmers a lots of headaches.
To analyze natural language sentences, some efforts have been made by parsing a sentence corpus by humans, and packaging it in
treebanks freely available for download in various versions in various languages. Treebanks could be used as references for checking correct solutions in solving a natural language parsing problem by a machine.
Does this answer your question?