good morning. Thank you for all your chat with suzette, it will give me lots to debug.
One--- the server is linux and the logins case sensitive- you now have two identities with suzette- One and one. I will change this in the future to force a common casing.
Wgb14- your vote doesnt complete until you confirm an email they ship you. ANd I've been working on this chatbot for just over a year.
Fact acquisition using parser data is the same whether it is 3rd person data or 2nd person data. I tend to store it differently in the knowledge base, because YOU facts are much more importand and the ownership is already known (YOU). For facts about third persons, like "The red cat walked down the street", one has to create a unique instance of a cat to bind the data two, and then later determine if future cats with data are the same or different. Hence I split off recogniztion of YOU data from 3rd party data as separate problems to address (albeit with a lot of similar or shared script).
I'm glad you've split off the chats themselves into another thread, I get my own copies of the logs and will pay no attention to that thread, so if you have questions to me, they should stay in this thread.
I THOUGHT I had posted this, which would have answered questions about [] ~ {}.... but I cant find it. So here it is again....
The syntax of my gamasutra article is, naturally, obsolete. But the concepts
remain. Here is an example of the revised syntax:
(< what [is are] {a} *1 >)
Words in parens must be matched in sequence, starting anywhere in the sentence.
< and > are start and end of sentence delimiters which restrict matching at the
boundaries. Here, the word what must be at the start of the input. Square
brackets represent a mandatory choice from any word in the collection, so the
next word must be is or are. Squiggly brackets represent an optional choice from
a collection, so if one matches it will be swallowed but if none match there is
no failure.
Matching is actually done simultaneously against the original word and its
canonical equivalent. If your pattern uses the canonical form of a word, it will
match the canonical or its derivative forms in the input, but if your pattern
uses a derivative form, it can only match input exactly. Since is and are are
derivative forms of the verb be, they must match exactly and other forms like
were, be, am will not match. But the optional a will match a or an since a is
the canonical form of an.
Next we have a wildcard *1 which will match exactly one word. * would match any
number including zero, *2 would match two, etc. *~ is a wildcard that matches a
short gap (zero, one, or two words). Since following the wildcard is the
sentence terminator mark, the wildcard must match the last word of the sentence.
So the original pattern can match:
What is a turtle?
What are elephants?
What is heresy?
And it cannot match:
What was a turtle?
What is that sound?
What is a turtle doing here?
We could modify the pattern to match all of the above sentences by writing this
pattern:
(< what be {that a the } [*3 *2 *1] >)
The input below would still not match, because it fails the end terminator test.
What was a turtle doing here then?
Patterns where you can only name explicit words are quite restricting. CHAT-L
allows you to use sets of words as well. You can either use an existing set
implied by the WordNet ontology or you can define your own. Wordnet ontologies
are designated using a word with a trailing ~ and number (e.g., animal~2) to
indicate the synset of the Wordnet word is being used. All words below that
synset meaning are encompassed into the set. A concept declaration allows you to
define your own set.
Sets of words match at the same speed as a single word, so we could make the
most recent pattern run faster and cover all sorts of determiners via:
Concept: ~determinerlist (the a those these that which some ~number )
(< what be { ~determinerlist } [*3 *2 *1] >)
~number is the system set of all numbers, so it will recognize any number word
(in digit or word format) as a determiner. Thus the pattern would match:
What were twenty-three turtles doing here?
The following takes advantage of sets and the ability to say a word is not in
the input using the ! operator.
concept: ~like (love want wish desire enjoy prefer adore lust)
( you *~ ~like !ferret *~ animal~3 )
The pattern finds you, then accepts a short range gap of up-to-two words til it
can find one of the ~like words. After the ~like word it requires that ferret
not show up anywhere in the rest of the sentence, and after a short gap, it
requires finding any animal from the Wordnet ontology. So this pattern matches
thousands of patterns where you express some kind of liking for an animal, as
long as that animal is not a ferret. And the animal can be in singular or
plural, since the original words are all the canonical singular forms of
animals. The following sentences match the pattern:
I love bobcats
I really like your dog, though I don't care for them usually
I can only like green cats on Tuesdays
Matching a set is as fast as matching a single item, making it a powerful tool
for generalization.
----------
Note on wildcards.... there are things I call gaps that use *, *1 , *~ etc. And there are things that bind onto a local variables the content they match. These are wildcards that use _ instead of *. In fact, many things bind onto autonumbered local variables.
Regular english words do not generate bindings because you knew what the word was already. So a pattern like:
s: ( I * ~love _ |directobject) will the matching word for ~love onto a variable named _#0. The _ wildcard will bind whatever words it matches onto _#1. Then the parser-found directobject will be bound onto _#2.
A test condition can be existenc only, in which case it is just by itself, like "I", and "love" and "|directobject". Or a test condition can be a relationship (usually equality/membership. It makes no SENSE to use a relationship when you know the word (though you can), so I=I is just a waste. A real pattern might be
s: (I * ~~action=~goodaction _ |directobject=~livingbeing)
where we say that it needs to find a member of the ~~action set, BUT that that member must also be a member of the ~goodaction set. and that the parser-found directobject must be a member of the ~livingbeing set.