V-Parser: developing an option towards NLP and some other thingies

ivan.moony · « **Reply #30 on:** September 04, 2017, 08:13:23 pm »

Quote from: korrelan on September 04, 2017, 08:02:15 pm

Really liking your work Ivanâ€¦

Thank you, but I should take these preliminary results with reserve. Looks cool for now, but it's not fully tested yet. $:-\$

Quote from: korrelan on September 04, 2017, 08:02:15 pm

Search speed is finite (GHz) storage is plentiful (TB).

For writing each byte, some time is spent. More there are bytes, more time is needed for their writing.

Zero · « **Reply #31 on:** September 05, 2017, 09:49:36 am »

Ok I see the point, thanks.
I think Korrelan is talking about memoization.

Hey, thinking about it, chatbots are like big memoized things... but let's not drift away from parsers!

ivan.moony · « **Reply #32 on:** October 08, 2017, 06:44:41 pm »

Well, this is something that works too (tested in practice), but it is not linear. The speed is about the same as in graphs already published in this thread:

Code

PROCEDURE Parse (grammar, text)
    DECLARE chart: [];
    Populate (chart[0], grammar.TOP_RULE, 0, []);
    FOR each nonempty column in chart  // chart can expand during this loop
        FOR each new item in column  // column can expand during this loop
            IF item.Rule is not terminal
                FOR each alternation of item.rule
                    Populate (column, alternation.sequence, 0, [item]);
            ELSE
                IF item.Rule.SCAN_TERMINAL (column.index, text)
                    item.Aheads.ADD_IF_NOT_EXIST (chart[item.RIGHT_EXTENT]);
                    FOR each occurence in item.Occurences
                        Populate (chart[item.RIGHT_EXTENT], occurence.sequence, occurence.index + 1, occurence.Parents);

    RETURN chart;

PROCEDURE Populate (array, sequence, index, parents)
    rule â† sequence[Index];
    item â† array.FIND (rule);
    IF not found item
        item â† NEW {Rule: rule, Occurences: [], Inheritors: [], Aheads:[]};
        array.ADD (item);

    IF Sequence.LENGTH > Index + 1
        AddOccurence (item, sequence, index, parents);

    ELSE
        FOR each parent in parents
            parent.Inheritors.ADD (item);
            FOR each toInherit in parent.Occurences
                AddOccurence (item, toInherit.Sequence, toInherit.Index, toInherit.Parents);

PROCEDURE AddOccurence (item, sequence, index, parents)
    occurence â† item.Occurences.FIND (sequence, index);
    IF not found occurence
        occurence â† NEW {Sequence: sequence, Index: index, Parents: []};
        item.Occurences.ADD (occurence);

    FOR each parent in parents
        occurence.Parents.ADD_IF_NOT_EXIST (parent);

    FOR each inheritor in item.Inheritors
        AddOccurence (inheritor, occurence.Sequence, occurence.Index, occurence.Parents);

    FOR each column in item.Aheads
        Populate (column, occurence.Sequence, occurence.Index + 1, occurence.Parents);

I thought of just adding a few lines to merge parents of the same right extent to make the parsing linear, but I'm onto implementing new (left-align)<->(right-align) philosophy because I don't like combining several different philosophies, using only fragments of them where appropriate, it doesn't feel right. I have at least two different algorithms by now (both cubic, I guess), and I wonder how many parsing algorithms are even possible, and which of them could be linear...

ivan.moony · « **Reply #33 on:** November 27, 2017, 08:02:31 pm »

I think I made it, it's almost linear.

You can check it out at: http://moony.atspace.cc/v-parser. Check predefined grammars (partial NLP included), or write new ones. Let me know if you can find out any grammar that is non-linear, relative to a length of input string.

I'll try to publish its little brother together with actual algorithm (about 30 lines) on Github soon. Still have to work out some compact parse forest output and other details. Scripting would also be supported, for binary search, or whatever.

ivan.moony · « **Reply #34 on:** December 02, 2017, 03:39:43 pm »

I've published the first version of V-Parser. It is free to use, and it is open source. You can download it from my GitHub repository and test it at online grammar development environment.

Don't hate me, I implemented only CPF output. Hopefully, this would be changed in the next version.

ivan.moony · « **Reply #35 on:** August 26, 2019, 02:04:26 pm »

I'm opening a new series of updates to v-parser under project "Esperas". Firstly, the algorithm is a bit cleaned up:

Code

01 DECLARE chart: [][], text: STRING;
02 
03 FUNCTION Parse (grammar, input)
04     text â† input;
05     chart.CLEAR ();
06     MergeItem (0, [grammar.START_SEQUENCE], 0, null);
07     FOR each new column in chart DO
08         FOR each new item in column DO
09             FOR each production of item.Sequence[item.Index] in grammar DO
10                 MergeItem (column.Index, production.sequence, 0, item);
11 
12     RETURN chart;
13 
14 PROCEDURE MergeItem (offset, sequence, index, parent)
15     item â† chart[offset].FIND (sequence, index);
16     IF not found item THEN
17         item â† {Sequence: sequence, Index: index, Inherited: [], Inheritors: [], Parents: []};
18         chart[offset].ADD (item);
19 
20     IF parent not in item.Parents THEN
21         item.Parents.ADD (parent);
22         FOR each x in [parent] UNION parent.Inherited DO
23             IF x.Index + 1 < x.Sequence.LENGTH THEN
24                 FOR each y in [item] UNION item.Inheritors DO
25                     IF y.Index + 1 == y.Sequence.LENGTH OR y is terminal THEN
26                         IF (x.Sequence, x.Index) not in y.Inherited THEN
27                             x.Inheritors.ADD (y);
28                             y.Inherited.ADD (x);
29                             
30                             IF y is terminal succeeded in text at offset THEN
31                                 FOR each z in x.Parents DO
32                                     MergeItem (offset + y.LENGTH, x.Sequence, x.Index + 1, z);

Secondly, Javascript implementation is simplified to the bones. I removed all unnecessary grammar tree processing, and the implementation now takes about 170 lines of code, taking a kind of context free grammar as a part of an input. Also, documentation finally started to shape up in some decent manner.

In incoming weeks I plan to extend the parser first to a "syntax logic" parser, then to a full Turing complete parser, but not incorporating unrestricted grammar, as proposed in mainstream textbooks. Because of their nature, with unrestricted grammar it is mostly impossible to build up an abstract syntax tree that grows from only one parent trunk, so the output is pretty messy and hard to cope with. Instead, I decided to use my own thought-child which I call "grammar templates". "Grammar templates" will behave somewhat like classic functions (we can trigger parametric productions) and will have strict parent-child branching from only one parent trunk, while providing Turing completeness wired into grammars. The final version will serve as a jumping board for meta-theory "Logos" language which I have written about here a while ago.

The first whole of Esperas is already on Github. One down, two more to go, wish me luck

Read about it here, and test the algorithm here.

ivan.moony · « **Reply #36 on:** January 07, 2020, 05:31:31 pm »

I just finished an algorithm for interpreting phrase logic structure grammar in V-Parse project (former Esperas). Pseudocode took 62 LOC. The idea is to use a sequence of logic propositions and connectives instead of flat context free grammar. There are indices that this parser could be used for theorem proving, and so everything else due to Curry-Howard correspondence. I still need to test it.

infurl · « **Reply #37 on:** January 08, 2020, 01:12:34 am »

I'm following your project with great interest Ivan and I'm looking forward to your updates.

Is "sequential normal form" your own invention?

ivan.moony · « **Reply #38 on:** January 08, 2020, 06:06:38 am »

Quote from: infurl on January 08, 2020, 01:12:34 am

I'm following your project with great interest Ivan and I'm looking forward to your updates.

Thank you Infurl

Quote from: infurl on January 08, 2020, 01:12:34 am

Is "sequential normal form" your own invention?

The part dealing with sequences is. But sequences are merely nameless predicates that anchor to the outer world by constants at arbitrary positions instead of by predicate names.

ivan.moony · « **Reply #39 on:** January 22, 2020, 08:42:40 pm »

Another algorithm concept is done. This time it's extending context free grammar to standardized unrestricted grammar as "phrase pair structure grammar". Really simple addon - only 31 LOC. There is also a short article about Turing machines.

But "phrase logic structure grammar" algo is gone now. Will return soon as "generic logic structure grammar".

infurl · « **Reply #40 on:** January 22, 2020, 11:04:07 pm »

From your github documentation:

Quote

Definition of the grammars and their ability to develop rules towards input string is similar to the context-free grammars definition, except that the left side of rules may also be sequences of symbols.

That seems like the same thing as a context sensitive grammar which sits between context free grammars and recursively enumerable grammars in the Chomsky hierarchy. What's different about yours?

The trouble with context sensitive grammars is that parsing them is an intractable problem. Even human brains can't handle context sensitive grammars and very few natural languages have syntactic elements that could be considered context sensitive. (This shouldn't be confused with the semantics of natural languages which is another story altogether.) On the other hand, context free grammars can be parsed in polynomial time, within N^3 to be specific, and modern parsers such as GLL and GLR can achieve this quite comfortably.

The semantic parser that I've developed closely links semantics to syntax in such a way that parsing is accomplished through the definition of a context free grammar and interpretation is controlled by a bag of feature restrictions that direct the process efficiently. It has the expressive power of a restricted context sensitive grammar and the efficiency and speed of a context free grammar.

Compare this to things like the LR(k) family of grammars which define most programming languages. They aren't unrestricted context free grammars but they are more powerful than regular expressions. It's a compromise that made them practical with the more limited computer power that was available back then because you could parse them on computers that had mere kilobytes of memory.

One day I imagine there will be quantum computers programming each other using unrestricted context sensitive grammars but they will be beyond our comprehension and they are certainly beyond the capabilities of any of our current machines.

Keep up the good work Ivan, I believe you're on the right track there. At the very least, you're following the same line of reasoning that I did.

ivan.moony · « **Reply #41 on:** January 22, 2020, 11:54:05 pm »

Quote from: infurl on January 22, 2020, 11:04:07 pm

From your github documentation:
Quote
Definition of the grammars and their ability to develop rules towards input string is similar to the context-free grammars definition, except that the left side of rules may also be sequences of symbols.

That seems like the same thing as a context sensitive grammar which sits between context free grammars and recursively enumerable grammars in the Chomsky hierarchy. What's different about yours?

Actually, at this point, it should represent exactly "recursively enumerable grammar" (or "Type-0" or "unrestricted grammar" in other words). Nothing new and different, I'm only surprised how minor was the algorithm change comparing to context free grammar one.

Quote from: infurl on January 22, 2020, 11:04:07 pm

The trouble with context sensitive grammars is that parsing them is an intractable problem. Even human brains can't handle context sensitive grammars and very few natural languages have syntactic elements that could be considered context sensitive. (This shouldn't be confused with the semantics of natural languages which is another story altogether.)

Even programming in unrestricted grammar makes me dizzy as much as programming in assembler does. You might be interested in the future planned extensions:

3.4. about the extensions introduced to context free grammars

We may conclude that unrestricted grammars are somewhat lower-level positioned on a grammar types scale, just like assembler is very much lower-level positioned in a programming language types scale. With introduction of generic variables and moreover logical reasoning, we try to climb up the grammar types scale to achieve the similar effect that higher-level programming languages achieved comparing to assembler. But unlike assembler, which we don't have opportunity to upgrade from outside because it is based on strict hardware, unrestricted grammar may be easily extended from the outside by new properties because it represents a flexible software, which we have a chance to carefully adjust according to our requirements. The adjustments that we chose to realize finally lead us to *generic logic structure grammar* language, aiming for relative simplicity of use in a way similar to constructive theorem proving.

Quote from: infurl on January 22, 2020, 11:04:07 pm

One day I imagine there will be quantum computers programming each other using unrestricted context sensitive grammars but they will be beyond our comprehension and they are certainly beyond the capabilities of any of our current machines.

Cool vision

ivan.moony · « **Reply #42 on:** January 26, 2020, 09:05:02 am »

Interested in parsing theory?

I conceptualized a parsing library for implementing Logos. Context free grammar is done (I built a novel algorithm for parsing it), but I want to extend it beyond unrestricted grammars. The extensions would be done over two axes: generic and logic, resulting in type (phrase: generic, structure:logic) grammars. But as I said, it's still merely a conceptual idea and still I need to implement it.

https://github.com/e-teoria/Esperas

ivan.moony · « **Reply #43 on:** May 11, 2020, 12:29:23 pm »

I restructured a bit the whole documentation, and introduced two important changes:

still untested pseudocode for logic structure grammars is finally published
approach to variables took another dimension, and now they somewhat reflect different degrees of abstraction overviewed in Logos conceptual metatheory language documentation.

Pseudocode for supporting variables (hopefully the last aspect of Esperas to really think about) is pending. After that, the final version will be adjusted, tested in Javascript, and published to GitHub.

The whole thing without variables should be guaranteed to halt, thus, considering logic approach, it may be characterized as decidable, inspite of that its acceptable input space may be infinite. But introducing variables would open space for recursions, and recursions may introduce infinite loops, so it may be possible to write grammars whose matching against input may never terminate because they may represent an infinte space of rules. A workaround would be conditional controlling each recursion step (like in procedural programming), but that introduces necessary incompleteness of grammar search space. On the other side, an algorithm supporting variables that wouldn't pose recursion restrictions, but would always terminate, would imply a solution to Entscheidungs problem, which academics hold as proven unsolvable.

Well, my mantra about scientific proofs is: academic proofs are there to be proven false one day. But this time I have a feeling that Entscheidungs problem unsolvability proof really holds.

V-Parser: developing an option towards NLP and some other thingies

ivan.moony

Re: Ahead Network Parser: developing an option towards NLP and some other thingies

Zero

Re: Ahead Network Parser: developing an option towards NLP and some other thingies

ivan.moony

Re: Ahead Network Parser: developing an option towards NLP and some other thingies

ivan.moony

Re: V-Parser: developing an option towards NLP and some other thingies

ivan.moony

Re: V-Parser: developing an option towards NLP and some other thingies

ivan.moony

Re: V-Parser: developing an option towards NLP and some other thingies

ivan.moony

Re: V-Parser: developing an option towards NLP and some other thingies

infurl

Re: V-Parser: developing an option towards NLP and some other thingies

ivan.moony

Re: V-Parser: developing an option towards NLP and some other thingies

ivan.moony

Re: V-Parser: developing an option towards NLP and some other thingies

infurl

Re: V-Parser: developing an option towards NLP and some other thingies

ivan.moony

Re: V-Parser: developing an option towards NLP and some other thingies

ivan.moony

Re: V-Parser: developing an option towards NLP and some other thingies

ivan.moony

Re: V-Parser: developing an option towards NLP and some other thingies

Recent Topics

Recent News

Users Online

Articles