As part of what I'm working on (same huge project for several weeks, woohoo!!), I was wondering how I was going to store statements, questions, ...etc. Structured meaning blocks.
Starting from word-vectors, with the famous example of king - man + woman = queen, my first intuition was that a block would be an ordered list of vectors.
Using stanleyfok's vector-object (https://github.com/stanleyfok/vector-object) library, that would be an array like this (dumb example):
[
new Vector({ react: 1, nodejs: 2, angular: 1 }),
new Vector({ nodejs: 2, marko: 3, nextjs: 2 })
]
But thinking about it, sentences and phrases are not really linear, they are rather like trees. Targetting natural language realization with jsRealB (https://github.com/lapalme/jsRealB), the shape of sentences (https://github.com/lapalme/jsRealB/blob/master/Architecture/README.md) is more like this ("he eats apples"):
S(Pro("I").g("m"),
VP(V("eat"),
NP(D("a"),N("apple").n("p")))
)
What I'm thinking here, is that meaning-vectors, organized as leaves of trees, and manipulated accordingly, should make a nice language of thought (https://en.wikipedia.org/wiki/Language_of_thought_hypothesis).
edit:
But then, how would I calculate the vector of a phrase or sentence, based on the vectors of its constituants?
Nice article about vectors from a machine-learning perspective (https://neptune.ai/blog/understanding-vectors-from-a-machine-learning-perspective) :knuppel2:
edit:
Got it. I think. An m-vector encodes not only a target, but also its role in the sentence or phrase. So the boy in (A) is NOT the same as the boy in (B). But the relation holds.
Let's call them:
[the boy](A)
[the boy](B)
[the apple](A)
[the apple](B)
Then we have:
[the apple](B) - [the apple](A) = [the boy](B) - [the boy](A)
Et voilà, everything follows.
edit:
But it still means the m-vector of the sentence is not the sum of the m-vectors of its constituants.
How about... adding siblings, and multiplying children? Or something.
edit:
:D The meaning-vector of a sentence is a formula, a higher-order value!!
boy = () => human(1) + young(1)
eats = () => action(1) + foodRelated(2)
apple = () => fruit(1) + foodRelated(3)
boy eats apple = (subject, verb, object) =>
human(1 * subject) + young(1 * subject) +
action(1 * verb) + foodRelated(2 * verb) +
fruit(1 * object) + foodRelated(3 * object)
Well, no since an apple doesn't have a mouth or other digestive capabilities, so it would not be the same.
Yes it would give the same m-vector for (A) and (B). Here is a snippet that shows this unwanted outcome:
const Vector = require("vector-object");
let boy = new Vector({
human: 1,
young: 1
});
let eats = new Vector({
action: 1,
foodRelated: 1
});
let apple = new Vector({
fruit: 1,
foodRelated: 1
});
let sentenceA = new Vector({}).add(boy).add(eats).add(apple);
console.log("boy + eats + apple = ", sentenceA);
let sentenceB = new Vector({}).add(apple).add(eats).add(boy);
console.log("apple + eats + boy = ", sentenceB);
console.log(
sentenceA.isEqual(sentenceB) ?
"boy + eats + apple == apple + eats + boy" :
"boy + eats + apple != apple + eats + boy"
);
This Javascript code actually outputs:
boy + eats + apple = Vector {
vector: { human: 1, young: 1, action: 1, foodRelated: 2, fruit: 1 }
}
apple + eats + boy = Vector {
vector: { fruit: 1, foodRelated: 2, action: 1, human: 1, young: 1 }
}
boy + eats + apple == apple + eats + boy
As you can see, the meaning of the sentence is not only the addition of the meaning of its constituants. The role of constituants has to be part of the sentence's m-vector, one way or another.
could be similar, im yet to code it tho, but thats one of my ideas that help plasticize its word dictionary, cause there is not enough dictionary size on the planet, for a decent chat bot, theres got to be some way of increasing the patterns you have.
In other words, the same m-vector should be realizable through different structures. I think it fits with my idea of a formula, where several "math" expressions can be equivalent.