Longest Common Subsequence (LCS) algorithms allow to find the longest subsequence common to all sequences in a set of sequences.
For a string example, consider the sequences "thisisatest" and "testing123testing". An LCS would be "tsitest".
An LCS algorithm can be easily enhanced to insert wildward characters where things are missing (done already). So, instead of "tsitest", we can get "t*s*i*test*"
Now, say you have a set of clauses, knowledge in the form of a list of simple strings:
Paul is a parent of John. Paul is a man. Paul is the father of John.
John is a parent of Miranda. John is a man. John is the father of Miranda.
Miranda is a parent of Sofia. Miranda is a woman. Miranda is the mother of Sofia.
Sofia is a parent of Chandler. Sofia is a woman. Sofia is the mother of Chandler.
Using LCS, we get:
shape:
{1} is a parent of {2}. {3} is a {4}. {5} is the {6} of {7}.
possibilities:
{1} {2} {3} {4} {5} {6} {7}
Paul John Paul man Paul father John
John Miranda John man John father Miranda
Miranda Sofia Miranda woman Miranda mother Sofia
Sofia Chandler Sofia woman Sofia mother Chandler
Here you can see that
apparently the following is always true:
{1} is the same as {3}
{1} is the same as {5}
{3} is the same as {5}
{2} is the same as {7}
when {4} is "man" then {6} is "father" and vice versa
when {4} is "woman" then {6} is "mother" and vice versa
We can reduce it to:
shape:
{1} is a parent of {2}. {1} is a {4}. {1} is the {6} of {2}.
possibilities:
{1} {2} {4} {6}
Paul John man father
John Miranda man father
Miranda Sofia woman mother
Sofia Chandler woman mother
rules:
when {4} is "man" then {6} is "father" and vice versa
when {4} is "woman" then {6} is "mother" and vice versa
Each chunk of knowledge is made of 3 sentences:
A: {1} is a parent of {2} => nothing in rules
B: {1} is a {4} => {4} in rules
C: {1} is the {6} of {2} => {6} in rules + everything from A is here
If one of these sentences is missing, how can we induce it?
If A is missing, we can be sure that it should be here if C is here, because everything from A is in C (A contains {1} and {2}, and C contains them both). So if C is here, A has to be here. We have to state every possible value of {6}.
sufficient: "1" is the father of "2"
sufficient: "1" is the mother of "2"
formulate: "1" is a parent of "2"
If B is missing, it should be here if we have both A and C. Since C is involved in rules, we have to state every possible value of {6} and apply the corresponding value of {4}.
necessary: "1" is a parent of "2"
necessary: "1" is the father of "2"
formulate: "1" is a man
necessary: "1" is a parent of "2"
necessary: "1" is the mother of "2"
formulate: "1" is a woman
If C is missing, we do like B missing.
necessary: "1" is a parent of "2"
necessary: "1" is a man
formulate: "1" is the father of "2"
necessary: "1" is a parent of "2"
necessary: "1" is a woman
formulate: "1" is the mother of "2"
Obviously, this is induction, so it can lead to mistakes. For instance, if we have only two men and both have a car, the bot would induce that every man has a car. Clauses that are formulated this way should be used as supposition only.
All this needs to be tested. I don't guarantee that it works!