CTRL

Zero · « **on:** April 09, 2022, 01:58:53 am »

I wired up the 3 minimalist tool I recently came up with, in a single structure I call "CTRL". It's not so tiny anymore, so please find the code on github.

I'm responding to your post in the other thread:

Quote from: Infurl

As you add data to your system you should also create automated consistency checks (regression tests) as you go. Ideally you want to get it to the point where you can add a little bit of information every day and immediately test to make sure you haven't broken anything, even if you don't remember everything that you already added. That way you can continue to build it up over time and not have to keep going back to the start and rewriting it all.

Another practice that will help you is to be able to freely convert the data that you do enter between formats. At the very least you should be able to automatically generate documentation from your data (a reference manual). Ultimately this will make it easier for other people to understand what you are doing and contribute to it. SUMO is a good example of a project like this which is mature, successful, and useful.

Even more important than documentation, are serialization for persistence and bridges to existing external tools. So yes, format translators should be planned.

What I'm worrying about now is, regression tests. For a program, it's easy to modularize things and test the modules' behavior in a controlled environment. But this is different, it's a network of knowledge. Testing is mandatory for any serious project, but how do you go about it.

infurl · « **Reply #1 on:** April 09, 2022, 03:16:01 am »

Quote from: Zero on April 09, 2022, 01:58:53 am

What I'm worrying about now is, regression tests. For a program, it's easy to modularize things and test the modules' behavior in a controlled environment. But this is different, it's a network of knowledge. Testing is mandatory for any serious project, but how do you go about it.

It is a difficult problem to solve and one that is critical to be able to make progress. I am deeply immersed in this aspect of my own project at the moment, and while it has taken a while to get going I feel like it is working well now so I might be able to offer some hints.

I am primarily working on a semantic parser for the English language. For testing I have divided the task up into the lexicon, phrases, and sentences. Everything in the lexicon has to be working correctly before there is much point in testing the different kinds of phrases. Likewise, all the phrases have to be working correctly before much can be done with whole sentences. Each of those test categories has an internal hierarchy of dependencies as well (for example, noun phrases depend on adjective phrases, determinative phrases, and preposition phrases; those in turn depend on adverbial phrases and may also depend on noun phrases again, recursively) but I have settled on those broad categories because I can use different versions of the grammar which are optimised for testing at each level.

The key point here is that you should establish a solid foundation and build up your system in layers. You need to work with a hierarchy because you won't be able to manage the problem as a heterarchy without chasing your own tail.

MagnusWootton · « **Reply #2 on:** April 09, 2022, 06:00:25 am »

I guess you check for when it tries to go false and true at the same time?
It happens when u make a architecture mistake, and when the computer is filling up the database itself.

Its not always cause of a simple double assignment, it can be because of over-generalization that causes it.

Zero · « **Reply #3 on:** April 09, 2022, 12:11:56 pm »

Quote

The key point here is that you should establish a solid foundation and build up your system in layers. You need to work with a hierarchy because you won't be able to manage the problem as a heterarchy without chasing your own tail.

I get the point, but the aim of the project I'm on is precisely to connect different things I've been working on before, now that I have a better 'big picture' in mind. The system has a center of collaboration, around which modules come to contribute to the overall behavior. For now, it looks like a star rather than like a pyramid. It is not easy to think of a refactoring procedure here, because these modules have really heterogeneous ways of working. They only meet at the central store, since they're all supposed to use it (their own way), working on data provided by other modules.

I guess keeping them small and testable is the way to go.

Edit.

I've added the kanban-tree idea from the other thread, along with a focus system. Somehow, they will be used by the main engine that runs the modules.
ctrl.js

Zero · « **Reply #4 on:** April 09, 2022, 11:02:50 pm »

So here is what I'm facing.

I first wanted to use a rather complete type system, à la Cyc, with isa links between classes and their instances, and gen links between classes and their subclasses. This is a multiple inheritance system.

Now, I need to learn patterns in the graph. To do so, I want a simple way to characterize the nodes that are involved in a pattern. The easiest way I know, is to use a simple type system, where the type of a node is atomic: it has to be either a simple identifier, or a single node.

In other words, to ease pattern learning, I'm switching from multiple inheritance to single inheritance, or to no inheritance at all.

infurl · « **Reply #5 on:** April 09, 2022, 11:29:33 pm »

Remember that as well as the type hierarchy there is the containment hierarchy. It doesn't entirely solve your dilemma but if you consider some relationships as HAS_A or PART_OF instead of IS_A, TYPE_OF, and GENERALISES then you will feel less pressure to go the multiple inheritance route.

I use a type hierarchy or ontology which is strictly a tree, so there is no multiple inheritance. Objects acquire type information by having a list of attributes, each of which specifies a particular subtype. Try defining your types using single inheritance and then using them as hashtags and see if that makes your model work better. You might also need to introduce the notion of named subsets of subtypes. I'm thinking about it but I haven't needed to implement it yet.

Zero · « **Reply #6 on:** April 10, 2022, 12:10:04 am »

Well I was about to delete my message when I saw yours.

In fact the system as it is now (still multiple inheritance) has nodes like this:

Code

    blankNode() {
        return {
            "@id": this.fresh("Node"),
            "@path": ["global"],
            "@date": Date.now(),
            "@doc": "https://aidreams.co.uk",
            "@out": {
                "@isa": new Set(),
                "@gen": new Set()
            },
            "@in": {
                "@isa": new Set(),
                "@gen": new Set()
            },
        }
    }

Yeah the default documentation URL is our forum

The @out are normal / forward links. This is where the tail of the arrows are. The @in are backward links, from the pointed node back to the pointer node. This is where the arrow heads are connected. @in-links are automatically set when @out-links are set. @isa and @gen are just specific cases of links (with @-names), but they are otherwise handled just the same as any other link that would be added to these ports. Indeed, the @out and @in ports are designed to host other kinds of links, like the ones you mention (has-part / part-of, ...etc).

The key to pattern learning is the concept of wildcard. The pattern is a structure which is constant but incomplete. Generalizations take the form of decorated constants (wildcards), for example "instance of house", where "instance of" is the decorator, and "house" is a constant. It corresponds to undefinite articles in natural languages. I know you know all this, but I'm getting to the point.

Having several of these constants to decorate (house) is a pain in the hash, because either you take all of them as a whole, but then your pattern learning is weak (too narrow), or you start learning combinations of them, and then you suffer from explosion.

A woman, who is a human being, a wife, a mother, an entrepreneur, a novel author, ...etc, risks her life to save a child. When you learn this pattern, what do you take into account? all of what she is? only some of them? which ones? why?

This is a poor real-life-like example. But my program should work only on programs, so it won't have to deal with all of these fuzzy things that make our lives interesting.

Still the question remains. Even in a mathematical, symbolic, virtual universe, if you take a lot of characteristics, your pattern won't match many cases. You cannot choose only some of them (because what criteria?). Then the last option is to learn all combinations. Boom, combinatorial explosion.

The answer is probably that the choice of criteria (the selection of the features to include in the pattern) should be a decision of the system.

infurl · « **Reply #7 on:** April 10, 2022, 12:18:39 am »

Quote from: Zero on April 10, 2022, 12:10:04 am

Still the question remains. Even in a mathematical, symbolic, virtual universe, if you take a lot of characteristics, your pattern won't match many cases. You cannot choose only some of them (because what criteria?). Then the last option is to learn all combinations. Boom, combinatorial explosion.

As long as you are thinking that pattern matches have to be all or nothing you will miss the solution. You have to rank the matches from exact matches to partial matches and then you select from most likely to least likely until you find a solution to the particular problem that you're trying to resolve. For example, if you are matching against three criteria, first consider the objects that match all three, then consider the objects that match any two, and finally consider the objects that match on any one of the criteria. If the first one you pick doesn't solve the problem then go back and pick the next best match.

Zero · « **Reply #8 on:** April 10, 2022, 12:25:29 am »

Yes I agree, but you're talking about pattern matching, while I'm talking about pattern learning. The problem is the making of the patterns, rather than their use which is indeed easier to figure out.

Are you suggesting there should be no wildcard at all, but just memories of what happened?

infurl · « **Reply #9 on:** April 10, 2022, 01:17:04 am »

Quote

Are you suggesting there should be no wildcard at all, but just memories of what happened?

You would have to do both in order to learn. You have to remember the pattern that you used and you have to remember the outcome of using it. By outcome, I mean whether it succeeded or failed, not the actual results because those will vary and be fairly useless outside the original context.

If you want to get even smarter, you will record the cost of using the pattern as well as how often it succeeds or fails. Over time you will learn which patterns get results and which don't, and of the ones that succeed, which ones succeed the fastest.

It would be worth your time to learn how a good relational database system works. A system like PostgreSQL for example gathers statistics about the data that it holds ahead of time, such as the number of distinct values in a column and their distribution. When it receives a query, the database manager formulates a number of different plans for satisfying the query and it uses the statistics it has gathered to select the one that is most likely to succeed in the shortest time. After executing the plan and returning the results, it will remember how closely the plan matched the predicted performance so it can potentially find a better plan next time and it will remember the best plans that it found so it can use them again.

frankinstien · « **Reply #10 on:** April 10, 2022, 07:22:47 pm »

Quote

Still the question remains. Even in a mathematical, symbolic, virtual universe, if you take a lot of characteristics, your pattern won't match many cases. You cannot choose only some of them (because what criteria?). Then the last option is to learn all combinations. Boom, combinatorial explosion.

The answer is probably that the choice of criteria (the selection of the features to include in the pattern) should be a decision of the system.

I use a very deep inheritance model along with all the other Object Oriented rules for data which includes nesting and polymorphism. If you realize that biological neurons use their dendritic inputs as pattern detectors and respond by proportional degrees, meaning they'll fire but from a minimal rate to a maximum rate that is directly proportional to the degree of matching. So, to deal with your issues its not so much as finding exact patterns but patterns that match to some degree. Now, this could produce a bottleneck of data responses but, for humans at least, we do seem to become aware of when our recalls end up leading to data overloads. This can be handled by simple response counts so when that does happen the machine asks for clarification, or algorithmically filters to reduce the overload, or ignores the data point(s) because they're too ambiguous. Using natures approach allows for an efficient means of sorting patterns to highest degree of fitness. From there you can ticker with contexts and exception modeling.

CTRL

Zero

CTRL

infurl

Re: CTRL

MagnusWootton

Re: CTRL

Zero

Re: CTRL

Zero

Re: CTRL

infurl

Re: CTRL

Zero

Re: CTRL

infurl

Re: CTRL

Zero

Re: CTRL

infurl

Re: CTRL

frankinstien

Re: CTRL

Recent Topics

Recent News

Users Online

Articles