Print Page - V-Parser: developing an option towards NLP and some other thingies

Member's Experiments & Projects => General Project Discussion => Topic started by: ivan.moony on July 17, 2017, 02:50:08 pm

Title: V-Parser: developing an option towards NLP and some other thingies
Post by: ivan.moony on July 17, 2017, 02:50:08 pm

One way to parse a natural language text is to parse it by certain BNF (https://en.wikipedia.org/wiki/Backus%E2%80%93Naur_form) notation grammar rules of relevant language. BNF could be a good choice because it is able to describe context free grammars (CFG) that allow ambiguity parses, while natural languages grammars are full of ambiguous alternations in the first-pass parse. Yet in the second pass we resolve which alternation is the right one, according to common sense rules. In example of: "I poked a monkey with a stick" we actually have to decide whether the narrator used a stick to poke the monkey or the monkey holds a stick. This is a very difficult task and even modern AI contests are arranged around resolving ambiguities like this. So, it is necessary to parse all ambiguities in order to pick the right one.

Recently I've been working on a CFG parser (the first pass), and what I've got so far is a fairly simple top-down algorithm with support for ambiguous parses that looks like this:

Code

function parse (startSymbol, text) {
    success = false;
    node = START_SYMBOL;

    while (true) {
        if (node is sequence) {
            node = node.FIRST_SEQUENCE_ELEMENT;

        } else if (node is alternation) {
            node = node.FIRST_ALTERNATION_ELEMENT;

        } else if (node.rule is terminal) {
            length = SCAN_TERMINAL (node.POSITION, node.TERMINAL);
            
            if (node.POSITION + length === text.length && length > -1)
                success = true;

            if (node.AHEAD_ITEM !== null && length > -1) {
                node = node.AHEAD_ITEM;

            } else if (node.NEXT_ALTERNATION !== null) {
                node = node.NEXT_ALTERNATION;

            } else
                break;
        }
    }

    return success;
}

Please note that this is only check-out parser that reports a success or a failure, providing a BNF-ish grammar and a text to parse. Some extra code is needed to record a pass through to an abstract syntax tree. Also, unavoidable "memo table" speed-up is omitted in this pseudo-code for simplicity.

The algorithm subsumes that we know every ahead item and next alternation for each node. This is not so hard to achieve, even on the fly, and releases us of having a depth stack while parsing. The algorithm is arranged to exhibit depth-first (https://en.wikipedia.org/wiki/Depth-first_search) parse through a grammar tree. The difficulty with depth-first parsing is that it chokes on "left recursion", entering an infinite loop. Left recursion could be resolved by detecting it when we reach it, skipping it over, and placing multiple AHEAD_ITEM-s for each recursive symbol. I think it would work (parser just considers several alternative ahead items, one that ends the recursion and others that continue recursion), but I am just not happy with this solution, it just doesn't "feel right" for me, I don't know why.

I am hoping for an universal solution that handles left recursion without extra ad-hoc code. The plan is to try to rewrite the algorithm to walk through symbols in breadth-first manner (https://en.wikipedia.org/wiki/Breadth-first_search). I think some node position chart is necessary in order for breadth-first version to work. Could be cool, I have to check it.

[Edit]
Utilizing memo table is basically the same problem as recursion. In both cases multiple ahead items are continued from two or more same symbols that start at the same place.

Title: Re: V-Parser: developing an option towards NLP and some other thingies
Post by: infurl on July 18, 2017, 08:53:47 pm

Using a memo table is the same principle as chart parsing. It stores intermediate results so it doesn't have to recalculate them. GLR parsers work using shift-reduce instead of recursion and don't choke on left-recursive rules. In fact they are most efficient when rules are written to use left-recursion.

Title: Re: V-Parser: developing an option towards NLP and some other thingies
Post by: ivan.moony on July 18, 2017, 09:04:55 pm

Considering GLR is basically bottom-up, how does it perform regarding to speed? Once I programmed a shift-reduce parser, but it was a way too slow because it was checking all the combinations from bottom towards up, no matter if an upper symbol fits at given place. I think that a major reason for speed of top down parsers is that they check only what they expect at given positions.

Title: Re: V-Parser: developing an option towards NLP and some other thingies
Post by: LOCKSUIT on July 18, 2017, 11:35:06 pm

The best thing I can offer for LNNs or VNNs, is that from naturally living a life, we make enough connections so that the network recognizes the correct sequence selection and also outputs the correct sequence selection to say externally or internally to one's self. And also allows discovering by mere words, or even images, bits, or particles.

Title: Re: V-Parser: developing an option towards NLP and some other thingies
Post by: ivan.moony on July 19, 2017, 09:51:48 am

Actually, there is analogy between parsers and neural networks. They both move up to the upper symbol if conditions are satisfied. With NN, only some of the conditions would be satisfied to move up, while with parsers all of the conditions should be satisfied to move up. In both cases we feed the bottom layer and we get the top layer by an algorithm. I think that some kind of a fuzzy parser implementation could serve as a replacement for NN.

Title: Re: V-Parser: developing an option towards NLP and some other thingies
Post by: 8pla.net on July 19, 2017, 03:37:21 pm

The parameter named "startSymbol" is never used in the function.

Title: Re: V-Parser: developing an option towards NLP and some other thingies
Post by: ivan.moony on July 19, 2017, 05:38:51 pm

@8pla.net: it's a pseudocode. Upper case letters are used where some sort of calculation is needed. In the case of "START_SYMBOL", it is an object constructed from "startSymbol", along with some extra informations. For example, "START_SYMBOL" in javascript could look like this:

Code

{
    position: 0,
    rule: startSymbol,
    aheadItem: null,
    nextAlternation: null
}

"node.FIRST_SEQUENCE_ELEMENT" could look like this:

Code

{
    position: node.position,
    rule: node.rule.sequence[0],
    aheadItem: {parentSequence: node, index: 1},
    nextAlternation: node.nextAlternation
}

and "node.AHEAD_ITEM" could look like this:

Code

{
    position: node.position + length,
    rule: node.aheadItem.parentSequence.sequence[node.aheadItem.index],
    aheadItem: (
        (node.aheadItem.index === node.aheadItem.parentSequence.length - 1)?
            node.aheadItem.parentSequence.aheadItem:
            {parentSequence: node.aheadItem.parentSequence, index: node.aheadItem.index + 1}
    ),
    nextAlternation: node.nextAlternation
}

Anyway, the top post pseudocode doesn't handle memos and left recursion. Ive been developing it further, and I think this time I have a complete version. This is the pseudocode:

Code

FUNCTION parse (startSymbol, text)
    VAR schedule as FIFO_QUEUE, memo as array, branch as object, node as object, success as boolean
    
    success = false
    schedule.ENQUEUE (START_SYMBOL)

    WHILE schedule is not empty
        branch = schedule.DEQUEUE ()

        IF branch is in memo || branch is terminal
            IF branch is in memo
                node = memo.GET (branch.POSITION, branch.RULE)

            ELSE
                node = branch.SCAN_TERMINAL ()

            IF node.SUCCESS AND branch.AHEAD_NETWORK is null AND text.length in node.RIGHT_EXTENTS
                success = true

            IF node.SUCCESS AND branch.AHEAD_NETWORK is not null
                FOR each a in branch.AHEAD_NETWORK
                    node.AHEAD_NETWORK.PUSH (a)
                    FOR each position in node.RIGHT_EXTENDS
                        schedule.ENQUEUE (a)

            ELSE IF branch.NEXT_ALTERNATION is not null
                schedule.ENQUEUE (branch.NEXT_ALTERNATION)

        ELSE
            IF branch is alternation
                schedule.ENQUEUE (branch.FIRST_ALTERNATION_ELEMENT)

            ELSE IF branch is sequence
                schedule.ENQUEUE (branch.FIRST_SEQUENCE_ELEMENT)

        memo.UPDATE (branch)

    RETURN success

It is about a network of ahead symbols. We walk through symbols in a depth-first manner. When a symbol is parsed for the first time, it's ahead symbol is stitched on memo chart. Later, in the process of parsing, if the symbol repeats on the same location, its new ahead symbols are additionally stitched and processed along with its old ahead symbols. I would call this as "ahead network" because it contains some structure in which parent ahead symbols are inherited to end-of-sequence and sub-alternations children symbols. Something like:

Code

... -> [
  sub-sub-node -> [
    sub-node -> [
      node -> [node_aheads],
      [sub-node_aheads]
    ],
    [sub-sub-node_aheads]
  ],
  ...
]

this way each network element can be enriched by new ahead symbols in the process of parsing, waiting for the future processing, if the same symbol gets re-considered. Note that inheriting works only downwards, and not upwards, which is exactly what we need.

The algorithm handles left recursion in the same way it handles symbols matched in memo chart. Recursive symbol gets enriched by a new ahead symbol that continues recursion, along with recursion terminating ahead symbol. It should also have no problems with nulls. This algorithm is constructed using depth-first tree walking policy, but the ahead system should work if we rewrite it to any tree walking policy, including breadth-first and other possible variations (https://en.wikipedia.org/wiki/Tree_traversal).

Another new thing I plan to introduce is what I would call "Compact Parse Forest". In CPF there are no additional syntax tree returned from the parsing function. Instead, all the symbol left and right extents are stored directly into the grammar tree, as arrays belonging to relevant symbols. If we later want to reconstruct the abstract syntax forest, we have to see which previous node right extent matches with which next node left extent in a sequence. A bit more complicated for extracting, but uses less memory and makes exposed kind of parsing possible (a kind of parsing based only on walking through ahead symbols). When processing a final result, special lens that do the matching could be used in a form of javascript proxy handlers (https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Proxy), so the end user would see no difference, while saving a memory for other stuff. CPF would also directly be used instead of memo table, as it has all the required informations - a symbol and its extents, and this additionally saves the memory. Someone would say that I'm obsessed with memory issues in a time when computers have several gigabytes of ram, but remember that more memory consumption means more reading and writing to memory, and that means more time spent to read/write the data. Maybe I'm also hoping to break some speed records for parsing CFG grammar based text data, but we'll see.

And last, but not least, an innovation would also be a human readable parse forest representation based on indexed continuations. I'll present this innovation a bit later, but first I have to check how the algorithm behaves in the practice, and does it work as it is intended anyway.

Title: Re: Ahead Network Parser: developing an option towards NLP and some other thingies
Post by: 8pla.net on July 20, 2017, 01:29:20 am

What does SCAN_TERMINAL() do?

Code

function SCAN_TERMINAL(x,y)
{
  len = y-x;
  return len;
}

Title: Re: Ahead Network Parser: developing an option towards NLP and some other thingies
Post by: ivan.moony on July 20, 2017, 06:25:13 am

Quote

What does SCAN_TERMINAL() do?

If the terminal is a constant string, it checks if the constant resides at a position in parsed text:

Code

function parseConstant (text, pos, constant)
    if (text.substring (pos, pos + constant.length) === constant)
        return pos + constant.length;
    else
        return -1;
}

but it could be extended to parse a regexp too. Lexer (https://en.wikipedia.org/wiki/Lexical_analysis) is a regular pre-parsing step where input string is broken into a word array, ready for further processing. But exposed ahead network parsing algorithm uses scannerless (https://en.wikipedia.org/wiki/Scannerless_parsing) method where scanning is done on the fly.

Title: Re: Ahead Network Parser: developing an option towards NLP and some other thingies
Post by: 8pla.net on July 21, 2017, 01:55:56 am

C Language:

Code

#include <stdio.h>
#include <string.h>

void main()
{
   const char *scan = "Scan Terminal", *terminal = "Terminal";
   char *node;
   int pos;

   node = strstr(scan, terminal);
   pos = node - scan;

   printf("Scanned, \"%s\". For, \"%s\" (at position %d).\n", scan, node,pos);
}

PROGRAM OUTPUT:

Scanned, "Scan Terminal". For, "Terminal" (at position 5).

@ivan.moony You mean something like this?
@Freddy: May we have syntax highlighting?
@All readers: This source code is for discussion purposes only.

Title: Re: V-Parser: developing an option towards NLP and some other thingies
Post by: infurl on July 21, 2017, 03:17:48 am

Quote from: ivan.moony on July 18, 2017, 09:04:55 pm

Considering GLR is basically bottom-up, how does it perform regarding to speed? Once I programmed a shift-reduce parser, but it was a way too slow because it was checking all the combinations from bottom towards up, no matter if an upper symbol fits at given place. I think that a major reason for speed of top down parsers is that they check only what they expect at given positions.

Implemented correctly, shift reduce parsers are as fast as parsing gets and GLR parsers are the fastest way to parse context free grammars.

This is achieved by repeatedly expanding constituents until you have lists of terminals in lookup tables so at any given point in the parse, the parser is only looking for certain terminal symbols which in turn yield a specific subset of possible non-terminals. Certainly it can take time to generate the lookup tables that the parser uses, but the algorithms to generate the lookup tables are very efficient and can also be implemented to create parts of the lookup tables "just in time".

This isn't just stuff that I've read, I spent years researching and developing the subject and I've written some extremely sophisticated libraries implementing GLR parsing. My parser can handle context free grammars containing millions of rules and can parse a book in the time that it takes most natural language parsers to parse a sentence. I'm using these libraries for my own purposes and have no immediate plans to release them since so few people seem to have a clue about how any of this actually works, it would be more trouble than it was worth.

Nevertheless, keep going the way you're going. You're doing great and you might get there yourself in another ten or twenty years. ;-)

Title: Re: Ahead Network Parser: developing an option towards NLP and some other thingies
Post by: ivan.moony on July 21, 2017, 09:14:50 am

I just fixed a small bug, a single line loop that updates past nodes is added when advancing to an ahead symbol.

@8pla.net: Almost. The only difference is that scan function doesn't move the parsing pointer arbitrary through text, yet in scanning, the position of pointer is set at fixed value that depends on the previous terminal position and length. Basically, the scanning function is called repeatedly until it reaches the end of text string, when the parsing returns success.

@infurl: Ten or twenty years? Boy, it doesn't sound very promising. I wonder, do you use a word binary search for very big alternations?

Title: Re: Ahead Network Parser: developing an option towards NLP and some other thingies
Post by: Freddy on July 21, 2017, 11:10:36 pm

Quote from: 8pla.net on July 21, 2017, 01:55:56 am

@Freddy: May we have syntax highlighting?

Gosh you ask a lot ;)

It was a pain in the bottom, but I managed to crowbar one in. I picked a dark background, just because I prefer to read code that way and the colours look better.

If you all prefer something else then maybe have a look at these and we can nit pick it : https://jmblog.github.io/color-themes-for-google-code-prettify/

Title: Re: Ahead Network Parser: developing an option towards NLP and some other thingies
Post by: infurl on July 22, 2017, 02:37:23 am

Quote from: ivan.moony on July 21, 2017, 09:14:50 am

@infurl: Ten or twenty years? Boy, it doesn't sound very promising. I wonder, do you use a word binary search for very big alternations?

How long it actually takes you will largely depend on how long you're willing to waste your time trying to reinvent everything yourself like I did. Once you realize that no matter what you invent somebody else already did it and did it better, then you start learning from others and you make much faster progress. It's only then that you stand a real chance of inventing something that actually is new. Hubris is the biggest obstacle to progress. I wish I'd learned that when I was a lot younger.

I tested my libraries extensively over a long period of time and optimized just about everything that could be optimized. Anything that can be precalculated is also pre-sorted and uses binary search, but in a lot of places the data structures are dynamic and so I make heavy use of top-down splay trees to speed up searching. It also collapses symbols into ranges wherever possible so it can do range comparisons instead of list searches.

The main library is a scannerless semantic parser. It goes all the way from raw characters to high level abstractions in one pass. Here's a sample of its output.

Code

<Clause>
    <Subject Clause_type='declarative' Number='plural' Person='third' Verb_type='complex_intransitive'>
        <Determinative_THE Category='definite'>The</Determinative_THE>
        <Nominal Number='plural'>
            <Noun_BOX Case='plain_case' Category='common_noun' Number='plural'>boxes</Noun_BOX>
        </Nominal>
    </Subject>
    <Predicator Clause_type='declarative' Number='plural' Person='third' Verb_type='complex_intransitive'>
        <Nonmodal_BE Number='plural' Person='third' Polarity='negative' Verb_form='present' Verb_type='complex_intransitive' to_infinitival='yes'>are not</Nonmodal_BE>
    </Predicator>
    <Subjective>
        <Preposition_phrase>
            <Preposition_ON Category='preposition'>on</Preposition_ON>
            <Determinative_THE Category='definite'>the</Determinative_THE>
            <Nominal Number='plural'>
                <Noun_TABLE Case='plain_case' Category='common_noun' Number='plural'>tables</Noun_TABLE>
            </Nominal>  
        </Preposition_phrase>
    </Subjective>
</Clause>

Title: Re: Ahead Network Parser: developing an option towards NLP and some other thingies
Post by: ivan.moony on July 22, 2017, 03:16:48 pm

I investigated GLR parsing. Ahead network parser is almost it, but not quite. The difference is that GLR is bottom-up, while ahead network parser is top-down. Ahead network parser also constructs the look ahead table elements on the fly, and ahead symbols are stitched to symbols, while GLR stitches ahead symbols to sequence positions. Thinking about speed, GLR parser could be somewhat faster, but I kind of like a simplicity of ahead network parser. I'd almost abandon my design in the favor of GLR parser, but what a heck, a simplicity is an advantage too.

Title: Re: Ahead Network Parser: developing an option towards NLP and some other thingies
Post by: ivan.moony on July 22, 2017, 07:15:26 pm

Quote

How long it actually takes you will largely depend on how long you're willing to waste your time trying to reinvent everything yourself like I did. Once you realize that no matter what you invent somebody else already did it and did it better, then you start learning from others and you make much faster progress. It's only then that you stand a real chance of inventing something that actually is new. Hubris is the biggest obstacle to progress. I wish I'd learned that when I was a lot younger.

Dream big. The bigger the dream is, more beautiful the world could be.

Title: Re: Ahead Network Parser: developing an option towards NLP and some other thingies
Post by: ivan.moony on July 28, 2017, 07:14:58 pm

You know that feeling when you program something, it works, but you still having an itch about it? Well, it happened to me with the last version of parsing algorithm. It should work in the practice, I even started implementing it, but something wouldn't give me a break, so I stopped what I was doing and returned to my scratch book. I spent another five days thinking, although the previous version could work fine, but a little bird was inexorably tweeting that the algorithm can be better.

... :headscratch: ...

And it can. You know that feeling when you give everything out of yourself, and the result just feels right? Well, it is just happening to me now, I'm celebrating it with a fresh lemon beer, and I'm enjoying in a good milestone.

:weee:

See for yourself:

Quote

FUNCTION parse (grammar, text)
DECLARE chart as ARRAY[LENGTH(text)][], terminals as ARRAY[]
chart[0][0] := {
Rule: grammar.START_SYMBOL,
Aheads: []
};
FOR each column in chart
terminals.CLEAR ()
WHILE new rows in column
// Phase 1: collect
FOR each new row in column
memo := column.GET_FIRST (row.Rule);
IF memo not equals row // found memo
FOR each item in row.Aheads
memo.Aheads.PUSH (item);

ELSE IF row.Rule is not terminal
FOR each production of row.Rule
column.PUSH (NODE_WITH_AHEAD_NETWORK (production.RIGHT_SIDE));

ELSE // row.Rule is terminal
IF row.SCAN_TERMINAL ()
terminals.PUSH (row);

// Phase 2: advance
FOR each term in terminals
FOR each new item in term.Aheads
IF item not in chart[term.RIGHT_EXTENT]
chart[term.RIGHT_EXTENT].PUSH (item);

RETURN chart;

Prose: a chart of empty columns is declared with a length of input text, and the top symbol is pushed at chart[0] column. Then the algorithm repeats two phases:

Phase 1: a loop goes through the column, unfolding non-terminal symbols to the end of the column. Each symbol internally remembers its possibly inherited ahead elements at the each unfold. When a duplicate symbol is reached (memo), its ahead elements are enriched by those of the new position in the grammar. When a terminal is reached, it is pushed at a terminal array and nothing new is written to the column, slowing down the column growth by ending that branch.
Phase 2: when the column loop is over (meaning that all terminals in the column and their ahead elements are recorded), each ahead element of all terminals at the terminal array is written at the corresponding distance column in the chart. If there was a null sized terminal in the terminal array, and there were previously new ahead elements put by the unfolding or memo system in the phase 1, the same chart column grows, and it is continued to be processed over again (it returns to the phase 1), until there is no new rows put in it. When there are no new rows to put in the same column, the terminal array is cleared, and the pointer moves to the next non-empty column in the chart and the process repeats.

Ahead network structure remained the same, meaning that we have a structure:

Quote

... <- [
sub-sub-node <- [
sub-node <- [
node <- [node_aheads],
[sub-node_aheads]
],
[sub-sub-node_aheads]
],
...
]

that is capable of inheriting elements from its parent nodes. When we extend an array of ahead symbols of an parent, we don't have to update any of its children because the structure itself do the magic for us. For example, all alternation elements inherit ahead elements from their parents. Also, some complex inheritance structures can happen to be constructed in cases of multiple referencing of symbols over complex grammars. Specific implementation of ahead network parser could need an array flattening function when ranging over structure of ahead elements, but that depends on skills of a programmer who might choose tail recursion iterator over array flattening.

What do you think? It looks almost like those cool simple pseudo-codes on Wikipedia when you search it for parsing algorithms. Theoretically, it shouldn't be ashamed of its speed, and It should exhibit linear parsing time for any periodically convergent or non-ambiguous grammars, no matter if it has a left or a right recursion. Its linearity of parsing speed should be reverse proportional to a degree of its grammar ambiguity divergence.

Tx little bird, I don't know if it was worth of trouble, but I'll try to implement it soon, after some obligations are done before, and then we'll see how it behaves in practice.

:favicon:

Title: Re: Ahead Network Parser: developing an option towards NLP and some other thingies
Post by: ivan.moony on August 03, 2017, 11:20:54 am

It is time to reveal what is going on behind ahead network system. Things are fairly simple, the Javascript function that lazily unfolds ahead items looks like this:

Code

function NODE_WITH_AHEAD_NETWORK (parent) {
    "use strict";
    function recurse (index) {
        "use strict";
        var aheads;
        return {
            rule: parent.rule.sequence[index],
            aheads: function () {
                if (!aheads)
                    aheads = (index < parent.rule.sequence.length - 1)?
                        [recurse (index + 1)]:
                        [parent.aheads ()];

                return aheads;
            }
        }
    }
    return recurse (0);
}

It works by incrementing a sequence index. If it reaches end of sequence, it inherits ahead item from its parent. This way, inheriting works for any depth in one assignment cycle, without expensive loops that get heavy weighted on right recursion. I updated a bit the above parser algorithm to match NODE_WITH_AHEAD_NETWORK function and simplified it a bit further. Next week I hope I'll have the first results on parsing speed. The beauty of the algorithm is that it never looks back in the chart, so it might be fast enough, even in Javascript.

Title: Re: Ahead Network Parser: developing an option towards NLP and some other thingies
Post by: ivan.moony on August 08, 2017, 04:50:54 pm

I tested the algorithm, it works like a charm. Well, almost, it parses and reports errors well, but it's a bit slow. It takes about the same time as an Earley parser, but I suspect it is due to building ahead JSON function conglomerate instead of having immutable code with mutable data. I'll see what can I do about this.

The previous javascript post contained some bugs (naturally), so I'll update it now, according to experimental code on my machine. It works, but it is subject to change, if I manage to optimize that part.

Title: Re: Ahead Network Parser: developing an option towards NLP and some other thingies
Post by: ivan.moony on August 11, 2017, 10:54:34 pm

I optimized the algorithm, got about 150% of starting speed. Moving function from JSON to outside didn't do much, but I managed to apply a few array optimizations to get the speed up.

I've done some tests with two grammars. One is left recursive and the other is right recursive version of the same self definition grammar for parsing.

Right recursive grammar version looks like this:

Code

(
    Parser :> (
        OptSpaceOrComments :> (
            (/(\s+)|(\/\/((.*\n)|(.*$)))|(\/\*[\S\s]*?\*\/)/, OptSpaceOrComments) |
            ''
        ) |
        String :> /("([^"\\\n]|(\\[\S\s]))*")|('([^'\\\n]|(\\[\S\s]))*')|(`([^`\\]|(\\[\S\s]))*`)/ |
        RegExp :> /\/(?!\*)(?!\/)([^\/\\\n]|(\\.))*\/i?/ |
        SymbolName :> /[_A-z][_A-z0-9]*/ |
        Root :> (
            AExp :> (
                Alternation :> (Item :> Root.SExp, '|', Next :> Root.AExp) |
                SExp :> (
                    Sequence :> (Item :> Root.STExp, ',', Next :> Root.SExp) |
                    STExp :> (
                        Subtype :> (Left :> Root.Exp, ':>', Right :> Root.STExp) |
                        Exp :> (
                            Parser.OptSpaceOrComments,
                            (
                                Parser.String |
                                Parser.RegExp |
                                NewSymbol :> Parser.SymbolName |
                                Reference :> (
                                    Parent :> ('' | Root.Group | Root.Reference | Parser.SymbolName),
                                    '.',
                                    Spec :> Parser.SymbolName
                                ) |
                                Group :> ('(', Root.AExp, ')')
                            ),
                            Parser.OptSpaceOrComments
                        )
                    )
                )
            )
        )
    )
).Parser.Root

and I got these rough results when parsing texts by this grammar:
(https://preview.ibb.co/iy4vHF/chart_RR.png)

Left recursive grammar version looks like this:

Code

(
    Parser :> (
        OptSpaceOrComments :> (
            (/(\s+)|(\/\/((.*\n)|(.*$)))|(\/\*[\S\s]*?\*\/)/, OptSpaceOrComments) |
            ''
        ) |
        String :> /("([^"\\\n]|(\\[\S\s]))*")|('([^'\\\n]|(\\[\S\s]))*')|(`([^`\\]|(\\[\S\s]))*`)/ |
        RegExp :> /\/(?!\*)(?!\/)([^\/\\\n]|(\\.))*\/i?/ |
        SymbolName :> /[_A-z][_A-z0-9]*/ |
        Root :> (
            AExp :> (
                Alternation :> (Item :> Root.AExp, '|', Next :> Root.SExp) |
                SExp :> (
                    Sequence :> (Item :> Root.SExp, ',', Next :> Root.STExp) |
                    STExp :> (
                        Subtype :> (Left :> Root.STExp, ':>', Right :> Root.Exp) |
                        Exp :> (
                            Parser.OptSpaceOrComments,
                            (
                                Parser.String |
                                Parser.RegExp |
                                NewSymbol :> Parser.SymbolName |
                                Reference :> (
                                    Parent :> ('' | Root.Group | Root.Reference | Parser.SymbolName),
                                    '.',
                                    Spec :> Parser.SymbolName
                                ) |
                                Group :> ('(', Root.AExp, ')')
                            ),
                            Parser.OptSpaceOrComments
                        )
                    )
                )
            )
        )
    )
).Parser.Root

and I got these results when parsing texts by this grammar:
(https://preview.ibb.co/eOnFHF/chart_LR.png)

I think I can be satisfied by speed, considering that this is a Javascript version operating on 4 years old Celeron M1000 on 1.8 GHz. As you can see, Chromium beats up Firefox for running this parser. Left recursive version performs slightly better. Both versions exhibit linear parsing speed, as they are non-ambuguous.

Ambiguously divergent grammars are parsing texts in logarithm scale, and as such, they are not very interesting for this parser. Natural languages are ambiguously periodically convergent, and as such, should be parsed in linear time.

Don't be intimidated by the grammar format, it is something I invested a lot of effort in and I hope you like it. It is self-explanatory if you are familiar with BNF grammar language (https://en.wikipedia.org/wiki/Backus%E2%80%93Naur_form). The two above examples are actually the grammar self definitions in left and right recursive versions.

I guess that the next steps include building English (and possibly other languages) grammars (possibly automatically from Treebank (https://en.wikipedia.org/wiki/Treebank) text files). I think I'm going open source on Github with this, but maybe I want to try to do something weird before publishing (I'll stay silent for now, you'd think I'm crazier than I am).

Title: Re: Ahead Network Parser: developing an option towards NLP and some other thingies
Post by: infurl on August 11, 2017, 11:10:45 pm

Nice work. Have you tried memoizing your function calls? It's generally the cheapest way to get a big performance improvement when parsing because it emulates the chart parsing mechanism, though of course there are many more advanced techniques which gain even greater comparative performance improvements.

Your grammar isn't really pure BNF because you've mixed a bunch of regular expressions in with it but most practical grammars do that anyway, that's why there are so many incompatible variations. I stuck to pure BNF for the syntax aspect of my parsing software because in the long run the results are much simpler and clearer. I did try variations like yours along the way. I think it was called ABNF or EBNF. It is used throughout the entire collection of internet RFC documents and was designed for handling pseudo code as well as formal languages.

To cover parsing natural languages you'll need to enhance your grammar in other ways, specifically to handle feature constraints. I found a particularly elegant solution to that problem. I will be very interested to see how you solve it.

Here's the grammar for my grammar (the syntax portion of it). I omitted the character definitions as they're a bit repetitive and should be easy to infer. :)

Code

; CONTEXT FREE GRAMMAR ---------------------------------------------------------

Grammar := rules rule_separator
Grammar := rule_separator rules rule_separator

rules := Rule
rules := rules rule_separator Rule

Rule := RuleName defined_as RuleBody

defined_as := char_colon char_equals
defined_as := char_colon char_equals white_space
defined_as := white_space char_colon char_equals
defined_as := white_space char_colon char_equals white_space

RuleName := letter
RuleName := letter rule_name_chars

rule_name_chars := rule_name_char
rule_name_chars := rule_name_chars rule_name_char

rule_name_char := letter
rule_name_char := digit
rule_name_char := char_underscore

RuleBody := rule_items

rule_items := rule_item
rule_items := rule_items white_space rule_item

rule_item := RuleName
rule_item := CharCode

CharCode := digits

rule_separator := end_of_line
rule_separator := line_comment
rule_separator := rule_separator end_of_line
rule_separator := rule_separator line_comment

line_comment := char_semicolon end_of_line
line_comment := char_semicolon line_comment_chars end_of_line

line_comment_chars := line_comment_char
line_comment_chars := line_comment_chars line_comment_char

line_comment_char := alphanumeric
line_comment_char := punctuation
line_comment_char := white_space


;-------------------------------------------------------------------------------

Title: Re: Ahead Network Parser: developing an option towards NLP and some other thingies
Post by: ivan.moony on August 11, 2017, 11:46:15 pm

Quote from: infurl

Have you tried memoizing your function calls?

Yes, they were memoized from the start, it is a must have. Check out the above function that returns the ahead network.

Quote from: infurl

To cover parsing natural languages you'll need to enhance your grammar in other ways, specifically to handle feature constraints.

I plan to handle grammar as pure CFG for sentence structure. For now, word formation will be left outside the parser, when the parsing is done. I'll just return an ambiguous forest, and it is up to user to check out if nouns, adjectives and verbs are what they are supposed to be, and to pick the combination that fits a word list and a common sense. Only rare word types like adverbs, prepositions, auxiliary verbs and others will be hard coded into a grammar, while the user is responsible to pick up the right combination of other words from given palette. That simple for this iteration, although I have plans for moving a word formation system right into the grammar, in the future, but that requires wiring with functional programming paradigm. I'll probably post a paper before on this subject, I also have some cool solutions worth to be seen.

I like your grammar. It is clean and simple.

Title: Re: Ahead Network Parser: developing an option towards NLP and some other thingies
Post by: Don Patrick on August 12, 2017, 08:28:17 am

Good work, Ivan. You've come a long way O0

Title: Re: Ahead Network Parser: developing an option towards NLP and some other thingies
Post by: ivan.moony on August 12, 2017, 10:43:08 am

Thank you, Don Patrick :)

Title: Re: Ahead Network Parser: developing an option towards NLP and some other thingies
Post by: ivan.moony on August 31, 2017, 03:34:40 pm

People, it seems that any grammar can be parsed in linear time in worst case, no matter how ambiguous it is!

Time of parsing (among other values) depends on string size and, until now, it has been a worst case cubic function of string size. This is something I've been quiet about for a month, as I know how crazy it sounds, but after some coding and testing it on the following grammars, it seems that ahead network parser always operates in linear parsing time (a linear function of string size):

Code

// grammar1
(
    A :> (
        (.A, .A) |
        'a'
    ),
    'b'
)

//grammar2
(
    A :> (
        (
            .A,
            (
                (.A, .A, .A) |
                'a'
            ),
            .A            
        ) |
        'a'
    ),
    'b'
)

// grammar 3
(
    A :> (
        B :> (
            (
                .A,
                (
                    (.A, 'a', .A) |
                    .B
                ),
                .A
        ) |
        'a'
    ),
    'b'
)

These grammars are parsing thousands of characters in a matter of dozens milliseconds! This speed is obtained using the fact that "compact parse forest" (another invention of mine) maps each grammar rule only once at given position, while still retaining possibility to reconstruct input string. Current fastest parsing algorithms build up the whole shared packed parse forest (SPPF) in explicit format that bounds them to worst case cubic time. Compact parse forest is a kind of implicit format where you can later construct SPPF, or anything else of the interest.

Man, if this is true for all cases of ambiguous grammars... after 50 years of worst case cubic time... I still can't believe it, I must be doing something wrong :-\

Title: Re: Ahead Network Parser: developing an option towards NLP and some other thingies
Post by: infurl on August 31, 2017, 08:55:50 pm

Great results if true. I'm not convinced that you're not just parsing regular expressions though. Try defining a grammar to parse XML and see how that performs. XML is very easy to parse using simple string searches, it was designed that way, but defining it and parsing it using a formal grammar will really stress your parser. Also, how exactly does your compact parse forest improve on the one that was published by Tomita in 1986?

Title: Re: Ahead Network Parser: developing an option towards NLP and some other thingies
Post by: ivan.moony on August 31, 2017, 10:54:38 pm

Quote from: infurl on August 31, 2017, 08:55:50 pm

I'm not convinced that you're not just parsing regular expressions though. Try defining a grammar to parse XML and see how that performs. XML is very easy to parse using simple string searches, it was designed that way, but defining it and parsing it using a formal grammar will really stress your parser

Thank you for reply, I know how crazy I look even to myself. I plan to do something - a parser generator - to thoroughly test these preliminary results. Will publish it on Github if everything goes ok, probably together with automatic grammar extractor from Treebank files (for natural language) in later iteration. If not, we will have an alternative O(n^3) parser which I can also use for further work.

Quote from: infurl on August 31, 2017, 08:55:50 pm

Also, how exactly does your compact parse forest improve on the one that was published by Tomita in 1986?

I'm not sure what Tomita published in 1986., but I'll refer to what I saw is called "todays most effective parse forest representation": Shared Packed Parse Forest (SPPF) (http://www.bramvandersanden.com/post/2014/06/shared-packed-parse-forest/). Consider grammar top symbol and all the possible paths that lead from top symbol to terminals, in whichever recursion there is. SPPF tracks all the paths, and considering there is maximum n^3 in-between nodes (n is input length), parsing naturally takes the time of O(n^3) in the worst case.

Compact parse forest (CPF) differently maps these nodes and greatly reduces amount of data it uses for forest representation. Main idea came from the fact that a context free grammar itself has a tree-like representation and I decided to reuse it for representing forests. What CPF really does during parsing is stitching a different array to each different grammar node (rule). These arrays contain elements of different pairs of left and right extents that occur inside parsed input for each rule. Thus, we don't store exact paths in CPF, we just have a grammar forest from before parsing where each tree is represented by unique rule that occur in text input on certain positions. Mentioned array only says on which positions a tree (rule) can be found inside the whole forest (input text). Watching from this perspective, a great deal of data is reduced (actual paths to terminals), while we store only rule positions that seem to grow linearly as the input text size grows.

However, there is a drawback in CPF, and that is in order to process it, we have to reconstruct paths to terminals afterwards parsing, which impacts speed of afterwards processing, but maybe it is worth of linear parsing time (if my results don't lie). It is a question and a matter of programming task whether CPF combination suits better than SPPF, but for my personal needs of constructing a language that holds a knowledge base, CPF will probably serve me better than SPPF because I don't plan to reconstruct whole paths, I just need references to certain rule parses. As for statistical analyzing of heavily ambiguous parse forests, CPF could have its advantages over SPPF.

The essence of CPF gave me indices that I could expect linear time parse to construct it, and by now I managed to successfully parse texts by all the grammars in this thread and more in nearly linear time, but we will see how it performs with other grammars when I build the parser generator. Anyway, I'm still not sure how to calculate exact worst case parsing time complexity.

Title: Re: Ahead Network Parser: developing an option towards NLP and some other thingies
Post by: Zero on September 04, 2017, 01:58:12 pm

I wish I could understand this.

Title: Re: Ahead Network Parser: developing an option towards NLP and some other thingies
Post by: ivan.moony on September 04, 2017, 07:40:47 pm

Yes, it is easy to understand, but complex to explain, as always.

Let's try this: regular parsers construct all the paths from the root to all the terminals as abstract syntax forest. I removed path construction outside of the algorithm, so I track only extents. As a great deal of nodes overlap, although they belong to different paths, when constructing whole paths we need more data than when attaching only extents. It is about a node reuse that is more frequent than with regular parsers.

The drawback is that what you don't do right away, you have to do later, and that means constructing paths from extents, if needed. The advantage comes up when we don't need to reconstruct the whole forest, but only fragments. In those cases it might be reasonable to use ahead network parser. This might make sense if you think about it in a frames of natural language, when we refer to a notion from another sentence. Then we don't have to know the whole referred sentence structure, we only need a reference to its part. This partial referencing pattern can be observed in some artificial languages too. And that is where it makes sense to use the proposed speedup.

Title: Re: Ahead Network Parser: developing an option towards NLP and some other thingies
Post by: Korrelan on September 04, 2017, 08:02:15 pm

Really liking your work Ivanâ€¦

Search speed is finite (GHz) storage is plentiful (TB).

:)

Title: Re: Ahead Network Parser: developing an option towards NLP and some other thingies
Post by: ivan.moony on September 04, 2017, 08:13:23 pm

Quote from: korrelan on September 04, 2017, 08:02:15 pm

Really liking your work Ivanâ€¦

Thank you, but I should take these preliminary results with reserve. Looks cool for now, but it's not fully tested yet. :-\

Quote from: korrelan on September 04, 2017, 08:02:15 pm

Search speed is finite (GHz) storage is plentiful (TB).

For writing each byte, some time is spent. More there are bytes, more time is needed for their writing.

Title: Re: Ahead Network Parser: developing an option towards NLP and some other thingies
Post by: Zero on September 05, 2017, 09:49:36 am

Ok I see the point, thanks.
I think Korrelan is talking about memoization.

Hey, thinking about it, chatbots are like big memoized things... but let's not drift away from parsers!

Title: Re: Ahead Network Parser: developing an option towards NLP and some other thingies
Post by: ivan.moony on October 08, 2017, 06:44:41 pm

Well, this is something that works too (tested in practice), but it is not linear. The speed is about the same as in graphs already published in this thread:

Code

PROCEDURE Parse (grammar, text)
    DECLARE chart: [];
    Populate (chart[0], grammar.TOP_RULE, 0, []);
    FOR each nonempty column in chart  // chart can expand during this loop
        FOR each new item in column  // column can expand during this loop
            IF item.Rule is not terminal
                FOR each alternation of item.rule
                    Populate (column, alternation.sequence, 0, [item]);
            ELSE
                IF item.Rule.SCAN_TERMINAL (column.index, text)
                    item.Aheads.ADD_IF_NOT_EXIST (chart[item.RIGHT_EXTENT]);
                    FOR each occurence in item.Occurences
                        Populate (chart[item.RIGHT_EXTENT], occurence.sequence, occurence.index + 1, occurence.Parents);

    RETURN chart;

PROCEDURE Populate (array, sequence, index, parents)
    rule â† sequence[Index];
    item â† array.FIND (rule);
    IF not found item
        item â† NEW {Rule: rule, Occurences: [], Inheritors: [], Aheads:[]};
        array.ADD (item);

    IF Sequence.LENGTH > Index + 1
        AddOccurence (item, sequence, index, parents);

    ELSE
        FOR each parent in parents
            parent.Inheritors.ADD (item);
            FOR each toInherit in parent.Occurences
                AddOccurence (item, toInherit.Sequence, toInherit.Index, toInherit.Parents);

PROCEDURE AddOccurence (item, sequence, index, parents)
    occurence â† item.Occurences.FIND (sequence, index);
    IF not found occurence
        occurence â† NEW {Sequence: sequence, Index: index, Parents: []};
        item.Occurences.ADD (occurence);

    FOR each parent in parents
        occurence.Parents.ADD_IF_NOT_EXIST (parent);

    FOR each inheritor in item.Inheritors
        AddOccurence (inheritor, occurence.Sequence, occurence.Index, occurence.Parents);

    FOR each column in item.Aheads
        Populate (column, occurence.Sequence, occurence.Index + 1, occurence.Parents);

I thought of just adding a few lines to merge parents of the same right extent to make the parsing linear, but I'm onto implementing new (left-align)<->(right-align) philosophy because I don't like combining several different philosophies, using only fragments of them where appropriate, it doesn't feel right. I have at least two different algorithms by now (both cubic, I guess), and I wonder how many parsing algorithms are even possible, and which of them could be linear...

Title: Re: V-Parser: developing an option towards NLP and some other thingies
Post by: ivan.moony on November 27, 2017, 08:02:31 pm

I think I made it, it's almost linear. :weee:

You can check it out at: http://moony.atspace.cc/v-parser (http://moony.atspace.cc/v-parser). Check predefined grammars (partial NLP included), or write new ones. Let me know if you can find out any grammar that is non-linear, relative to a length of input string.

I'll try to publish its little brother together with actual algorithm (about 30 lines) on Github soon. Still have to work out some compact parse forest output and other details. Scripting would also be supported, for binary search, or whatever.

Title: Re: V-Parser: developing an option towards NLP and some other thingies
Post by: ivan.moony on December 02, 2017, 03:39:43 pm

I've published the first version of V-Parser. It is free to use, and it is open source. You can download it from my GitHub repository (https://github.com/ivanvod/V-Parser) and test it at online grammar development environment (http://moony.atspace.cc/v-parser/).

Don't hate me, I implemented only CPF output. Hopefully, this would be changed in the next version.

Title: Re: V-Parser: developing an option towards NLP and some other thingies
Post by: ivan.moony on August 26, 2019, 02:04:26 pm

I'm opening a new series of updates to v-parser under project "Esperas". Firstly, the algorithm is a bit cleaned up:

Code

01 DECLARE chart: [][], text: STRING;
02 
03 FUNCTION Parse (grammar, input)
04     text â† input;
05     chart.CLEAR ();
06     MergeItem (0, [grammar.START_SEQUENCE], 0, null);
07     FOR each new column in chart DO
08         FOR each new item in column DO
09             FOR each production of item.Sequence[item.Index] in grammar DO
10                 MergeItem (column.Index, production.sequence, 0, item);
11 
12     RETURN chart;
13 
14 PROCEDURE MergeItem (offset, sequence, index, parent)
15     item â† chart[offset].FIND (sequence, index);
16     IF not found item THEN
17         item â† {Sequence: sequence, Index: index, Inherited: [], Inheritors: [], Parents: []};
18         chart[offset].ADD (item);
19 
20     IF parent not in item.Parents THEN
21         item.Parents.ADD (parent);
22         FOR each x in [parent] UNION parent.Inherited DO
23             IF x.Index + 1 < x.Sequence.LENGTH THEN
24                 FOR each y in [item] UNION item.Inheritors DO
25                     IF y.Index + 1 == y.Sequence.LENGTH OR y is terminal THEN
26                         IF (x.Sequence, x.Index) not in y.Inherited THEN
27                             x.Inheritors.ADD (y);
28                             y.Inherited.ADD (x);
29                             
30                             IF y is terminal succeeded in text at offset THEN
31                                 FOR each z in x.Parents DO
32                                     MergeItem (offset + y.LENGTH, x.Sequence, x.Index + 1, z);

Secondly, Javascript implementation is simplified to the bones. I removed all unnecessary grammar tree processing, and the implementation now takes about 170 lines of code, taking a kind of context free grammar as a part of an input. Also, documentation finally started to shape up in some decent manner.

In incoming weeks I plan to extend the parser first to a "syntax logic" parser, then to a full Turing complete parser, but not incorporating unrestricted grammar (https://en.wikipedia.org/wiki/Unrestricted_grammar), as proposed in mainstream textbooks. Because of their nature, with unrestricted grammar it is mostly impossible to build up an abstract syntax tree that grows from only one parent trunk, so the output is pretty messy and hard to cope with. Instead, I decided to use my own thought-child which I call "grammar templates". "Grammar templates" will behave somewhat like classic functions (we can trigger parametric productions) and will have strict parent-child branching from only one parent trunk, while providing Turing completeness wired into grammars. The final version will serve as a jumping board for meta-theory "Logos" language (https://aidreams.co.uk/forum/index.php?topic=14096.msg59092) which I have written about here a while ago.

The first whole of Esperas is already on Github. One down, two more to go, wish me luck :)

Read about it here (https://github.com/e-teoria/Esperas), and test the algorithm here (https://e-teoria.github.io/Esperas/test).

Title: Re: V-Parser: developing an option towards NLP and some other thingies
Post by: ivan.moony on January 07, 2020, 05:31:31 pm

I just finished an algorithm for interpreting phrase logic structure grammar in V-Parse project (https://github.com/e-teoria/V-Parse) (former Esperas). Pseudocode took 62 LOC. The idea is to use a sequence of logic propositions and connectives instead of flat context free grammar. There are indices that this parser could be used for theorem proving, and so everything else due to Curry-Howard correspondence. I still need to test it.

Title: Re: V-Parser: developing an option towards NLP and some other thingies
Post by: infurl on January 08, 2020, 01:12:34 am

I'm following your project with great interest Ivan and I'm looking forward to your updates.

Is "sequential normal form" your own invention?

Title: Re: V-Parser: developing an option towards NLP and some other thingies
Post by: ivan.moony on January 08, 2020, 06:06:38 am

Quote from: infurl on January 08, 2020, 01:12:34 am

I'm following your project with great interest Ivan and I'm looking forward to your updates.

Thank you Infurl :)

Quote from: infurl on January 08, 2020, 01:12:34 am

Is "sequential normal form" your own invention?

The part dealing with sequences is. But sequences are merely nameless predicates that anchor to the outer world by constants at arbitrary positions instead of by predicate names.

Title: Re: V-Parser: developing an option towards NLP and some other thingies
Post by: ivan.moony on January 22, 2020, 08:42:40 pm

Another algorithm concept is done. This time it's extending context free grammar to standardized unrestricted grammar (https://github.com/e-teoria/Esperas) as "phrase pair structure grammar". Really simple addon - only 31 LOC. There is also a short article about Turing machines.

But "phrase logic structure grammar" algo is gone now. Will return soon as "generic logic structure grammar".

Title: Re: V-Parser: developing an option towards NLP and some other thingies
Post by: infurl on January 22, 2020, 11:04:07 pm

From your github documentation:

Quote

Definition of the grammars and their ability to develop rules towards input string is similar to the context-free grammars definition, except that the left side of rules may also be sequences of symbols.

That seems like the same thing as a context sensitive grammar which sits between context free grammars and recursively enumerable grammars in the Chomsky hierarchy. What's different about yours?

The trouble with context sensitive grammars is that parsing them is an intractable problem. Even human brains can't handle context sensitive grammars and very few natural languages have syntactic elements that could be considered context sensitive. (This shouldn't be confused with the semantics of natural languages which is another story altogether.) On the other hand, context free grammars can be parsed in polynomial time, within N^3 to be specific, and modern parsers such as GLL and GLR can achieve this quite comfortably.

The semantic parser that I've developed closely links semantics to syntax in such a way that parsing is accomplished through the definition of a context free grammar and interpretation is controlled by a bag of feature restrictions that direct the process efficiently. It has the expressive power of a restricted context sensitive grammar and the efficiency and speed of a context free grammar.

Compare this to things like the LR(k) family of grammars which define most programming languages. They aren't unrestricted context free grammars but they are more powerful than regular expressions. It's a compromise that made them practical with the more limited computer power that was available back then because you could parse them on computers that had mere kilobytes of memory.

One day I imagine there will be quantum computers programming each other using unrestricted context sensitive grammars but they will be beyond our comprehension and they are certainly beyond the capabilities of any of our current machines.

Keep up the good work Ivan, I believe you're on the right track there. At the very least, you're following the same line of reasoning that I did.

Title: Re: V-Parser: developing an option towards NLP and some other thingies
Post by: ivan.moony on January 22, 2020, 11:54:05 pm

Quote from: infurl on January 22, 2020, 11:04:07 pm

From your github documentation:
Quote
Definition of the grammars and their ability to develop rules towards input string is similar to the context-free grammars definition, except that the left side of rules may also be sequences of symbols.

That seems like the same thing as a context sensitive grammar which sits between context free grammars and recursively enumerable grammars in the Chomsky hierarchy. What's different about yours?

Actually, at this point, it should represent exactly "recursively enumerable grammar" (or "Type-0" or "unrestricted grammar" in other words). Nothing new and different, I'm only surprised how minor was the algorithm change comparing to context free grammar one.

Quote from: infurl on January 22, 2020, 11:04:07 pm

The trouble with context sensitive grammars is that parsing them is an intractable problem. Even human brains can't handle context sensitive grammars and very few natural languages have syntactic elements that could be considered context sensitive. (This shouldn't be confused with the semantics of natural languages which is another story altogether.)

Even programming in unrestricted grammar makes me dizzy as much as programming in assembler does. You might be interested in the future planned extensions:

3.4. about the extensions introduced to context free grammars

We may conclude that unrestricted grammars are somewhat lower-level positioned on a grammar types scale, just like assembler is very much lower-level positioned in a programming language types scale. With introduction of generic variables and moreover logical reasoning, we try to climb up the grammar types scale to achieve the similar effect that higher-level programming languages achieved comparing to assembler. But unlike assembler, which we don't have opportunity to upgrade from outside because it is based on strict hardware, unrestricted grammar may be easily extended from the outside by new properties because it represents a flexible software, which we have a chance to carefully adjust according to our requirements. The adjustments that we chose to realize finally lead us to *generic logic structure grammar* language, aiming for relative simplicity of use in a way similar to constructive theorem proving.

Quote from: infurl on January 22, 2020, 11:04:07 pm

One day I imagine there will be quantum computers programming each other using unrestricted context sensitive grammars but they will be beyond our comprehension and they are certainly beyond the capabilities of any of our current machines.

Cool vision :)

Title: Re: V-Parser: developing an option towards NLP and some other thingies
Post by: ivan.moony on January 26, 2020, 09:05:02 am

Interested in parsing theory?

I conceptualized a parsing library for implementing Logos. Context free grammar is done (I built a novel algorithm for parsing it), but I want to extend it beyond unrestricted grammars. The extensions would be done over two axes: generic and logic, resulting in type (phrase: generic, structure:logic) grammars. But as I said, it's still merely a conceptual idea and still I need to implement it.

https://github.com/e-teoria/Esperas (https://github.com/e-teoria/Esperas)

Title: Re: V-Parser: developing an option towards NLP and some other thingies
Post by: ivan.moony on May 11, 2020, 12:29:23 pm

I restructured a bit the whole documentation, and introduced two important changes:

still untested pseudocode for logic structure grammars (https://github.com/e-teoria/Esperas#23-logic-structure) is finally published
approach to variables took another dimension, and now they somewhat reflect different degrees of abstraction overviewed in Logos conceptual metatheory language (https://github.com/e-teoria/Logos) documentation.

Pseudocode for supporting variables (hopefully the last aspect of Esperas to really think about) is pending. After that, the final version will be adjusted, tested in Javascript, and published to GitHub.

The whole thing without variables should be guaranteed to halt, thus, considering logic approach, it may be characterized as decidable, inspite of that its acceptable input space may be infinite. But introducing variables would open space for recursions, and recursions may introduce infinite loops, so it may be possible to write grammars whose matching against input may never terminate because they may represent an infinte space of rules. A workaround would be conditional controlling each recursion step (like in procedural programming), but that introduces necessary incompleteness of grammar search space. On the other side, an algorithm supporting variables that wouldn't pose recursion restrictions, but would always terminate, would imply a solution to Entscheidungs problem (https://en.wikipedia.org/wiki/Entscheidungsproblem), which academics hold as proven unsolvable.

Well, my mantra about scientific proofs is: academic proofs are there to be proven false one day. But this time I have a feeling that Entscheidungs problem unsolvability proof really holds.

Ai Dreams Forum

Member's Experiments & Projects => General Project Discussion => Topic started by: ivan.moony on July 17, 2017, 02:50:08 pm