V-Parser: developing an option towards NLP and some other thingies

  • 42 Replies
  • 9707 Views
*

ivan.moony

  • Trusty Member
  • ***********
  • Eve
  • *
  • 1385
    • e-teoria
Really liking your work Ivan…
Thank you, but I should take these preliminary results with reserve. Looks cool for now, but it's not fully tested yet. :-\

Search speed is finite (GHz) storage is plentiful (TB).
For writing each byte, some time is spent. More there are bytes, more time is needed for their writing.
There exist some rules interwoven within this world. As much as it is a blessing, so much it is a curse.

*

Zero

  • Trusty Member
  • *********
  • Terminator
  • *
  • 980
  • Ready?
    • Thinkbots are free
Ok I see the point, thanks.
I think Korrelan is talking about memoization.

Hey, thinking about it, chatbots are like big memoized things... but let's not drift away from parsers!

*

ivan.moony

  • Trusty Member
  • ***********
  • Eve
  • *
  • 1385
    • e-teoria
Well, this is something that works too (tested in practice), but it is not linear. The speed is about the same as in graphs already published in this thread:
Code: [Select]
PROCEDURE Parse (grammar, text)
    DECLARE chart: [];
    Populate (chart[0], grammar.TOP_RULE, 0, []);
    FOR each nonempty column in chart  // chart can expand during this loop
        FOR each new item in column  // column can expand during this loop
            IF item.Rule is not terminal
                FOR each alternation of item.rule
                    Populate (column, alternation.sequence, 0, [item]);
            ELSE
                IF item.Rule.SCAN_TERMINAL (column.index, text)
                    item.Aheads.ADD_IF_NOT_EXIST (chart[item.RIGHT_EXTENT]);
                    FOR each occurence in item.Occurences
                        Populate (chart[item.RIGHT_EXTENT], occurence.sequence, occurence.index + 1, occurence.Parents);

    RETURN chart;

PROCEDURE Populate (array, sequence, index, parents)
    rule ← sequence[Index];
    item ← array.FIND (rule);
    IF not found item
        item ← NEW {Rule: rule, Occurences: [], Inheritors: [], Aheads:[]};
        array.ADD (item);

    IF Sequence.LENGTH > Index + 1
        AddOccurence (item, sequence, index, parents);

    ELSE
        FOR each parent in parents
            parent.Inheritors.ADD (item);
            FOR each toInherit in parent.Occurences
                AddOccurence (item, toInherit.Sequence, toInherit.Index, toInherit.Parents);

PROCEDURE AddOccurence (item, sequence, index, parents)
    occurence ← item.Occurences.FIND (sequence, index);
    IF not found occurence
        occurence ← NEW {Sequence: sequence, Index: index, Parents: []};
        item.Occurences.ADD (occurence);

    FOR each parent in parents
        occurence.Parents.ADD_IF_NOT_EXIST (parent);

    FOR each inheritor in item.Inheritors
        AddOccurence (inheritor, occurence.Sequence, occurence.Index, occurence.Parents);

    FOR each column in item.Aheads
        Populate (column, occurence.Sequence, occurence.Index + 1, occurence.Parents);

I thought of just adding a few lines to merge parents of the same right extent to make the parsing linear, but I'm onto implementing new (left-align)<->(right-align) philosophy because I don't like combining several different philosophies, using only fragments of them where appropriate, it doesn't feel right. I have at least two different algorithms by now (both cubic, I guess), and I wonder how many parsing algorithms are even possible, and which of them could be linear...
There exist some rules interwoven within this world. As much as it is a blessing, so much it is a curse.

*

ivan.moony

  • Trusty Member
  • ***********
  • Eve
  • *
  • 1385
    • e-teoria
Re: V-Parser: developing an option towards NLP and some other thingies
« Reply #33 on: November 27, 2017, 08:02:31 PM »
I think I made it, it's almost linear.  :weee:

You can check it out at: http://moony.atspace.cc/v-parser. Check predefined grammars (partial NLP included), or write new ones. Let me know if you can find out any grammar that is non-linear, relative to a length of input string.

I'll try to publish its little brother together with actual algorithm (about 30 lines) on Github soon. Still have to work out some compact parse forest output and other details. Scripting would also be supported, for binary search, or whatever.
« Last Edit: December 02, 2017, 01:39:13 PM by ivan.moony »
There exist some rules interwoven within this world. As much as it is a blessing, so much it is a curse.

*

ivan.moony

  • Trusty Member
  • ***********
  • Eve
  • *
  • 1385
    • e-teoria
Re: V-Parser: developing an option towards NLP and some other thingies
« Reply #34 on: December 02, 2017, 03:39:43 PM »
I've published the first version of V-Parser. It is free to use, and it is open source. You can download it from my GitHub repository and test it at online grammar development environment.

Don't hate me, I implemented only CPF output. Hopefully, this would be changed in the next version.
There exist some rules interwoven within this world. As much as it is a blessing, so much it is a curse.

*

ivan.moony

  • Trusty Member
  • ***********
  • Eve
  • *
  • 1385
    • e-teoria
Re: V-Parser: developing an option towards NLP and some other thingies
« Reply #35 on: August 26, 2019, 02:04:26 PM »
I'm opening a new series of updates to v-parser under project "Esperas". Firstly, the algorithm is a bit cleaned up:

Code: [Select]
01 DECLARE chart: [][], text: STRING;
02
03 FUNCTION Parse (grammar, input)
04     text ← input;
05     chart.CLEAR ();
06     MergeItem (0, [grammar.START_SEQUENCE], 0, null);
07     FOR each new column in chart DO
08         FOR each new item in column DO
09             FOR each production of item.Sequence[item.Index] in grammar DO
10                 MergeItem (column.Index, production.sequence, 0, item);
11
12     RETURN chart;
13
14 PROCEDURE MergeItem (offset, sequence, index, parent)
15     item ← chart[offset].FIND (sequence, index);
16     IF not found item THEN
17         item ← {Sequence: sequence, Index: index, Inherited: [], Inheritors: [], Parents: []};
18         chart[offset].ADD (item);
19
20     IF parent not in item.Parents THEN
21         item.Parents.ADD (parent);
22         FOR each x in [parent] UNION parent.Inherited DO
23             IF x.Index + 1 < x.Sequence.LENGTH THEN
24                 FOR each y in [item] UNION item.Inheritors DO
25                     IF y.Index + 1 == y.Sequence.LENGTH OR y is terminal THEN
26                         IF (x.Sequence, x.Index) not in y.Inherited THEN
27                             x.Inheritors.ADD (y);
28                             y.Inherited.ADD (x);
29                             
30                             IF y is terminal succeeded in text at offset THEN
31                                 FOR each z in x.Parents DO
32                                     MergeItem (offset + y.LENGTH, x.Sequence, x.Index + 1, z);

Secondly, Javascript implementation is simplified to the bones. I removed all unnecessary grammar tree processing, and the implementation now takes about 170 lines of code, taking a kind of context free grammar as a part of an input. Also, documentation finally started to shape up in some decent manner.

In incoming weeks I plan to extend the parser first to a "syntax logic" parser, then to a full Turing complete parser, but not incorporating unrestricted grammar, as proposed in mainstream textbooks. Because of their nature, with unrestricted grammar it is mostly impossible to build up an abstract syntax tree that grows from only one parent trunk, so the output is pretty messy and hard to cope with. Instead, I decided to use my own thought-child which I call "grammar templates". "Grammar templates" will behave somewhat like classic functions (we can trigger parametric productions) and will have strict parent-child branching from only one parent trunk, while providing Turing completeness wired into grammars. The final version will serve as a jumping board for meta-theory "Logos" language which I have written about here a while ago.

The first whole of Esperas is already on Github. One down, two more to go, wish me luck :)

Read about it here, and test the algorithm here.
« Last Edit: August 26, 2019, 02:24:57 PM by ivan.moony »
There exist some rules interwoven within this world. As much as it is a blessing, so much it is a curse.

*

ivan.moony

  • Trusty Member
  • ***********
  • Eve
  • *
  • 1385
    • e-teoria
Re: V-Parser: developing an option towards NLP and some other thingies
« Reply #36 on: January 07, 2020, 05:31:31 PM »
I just finished an algorithm for interpreting phrase logic structure grammar in V-Parse project (former Esperas). Pseudocode took 62 LOC. The idea is to use a sequence of logic propositions and connectives instead of flat context free grammar. There are indices that this parser could be used for theorem proving, and so everything else due to Curry-Howard correspondence. I still need to test it.
There exist some rules interwoven within this world. As much as it is a blessing, so much it is a curse.

*

infurl

  • Trusty Member
  • *********
  • Terminator
  • *
  • 765
  • Humans will disappoint you.
    • Home Page
Re: V-Parser: developing an option towards NLP and some other thingies
« Reply #37 on: January 08, 2020, 01:12:34 AM »
I'm following your project with great interest Ivan and I'm looking forward to your updates.

Is "sequential normal form" your own invention?

*

ivan.moony

  • Trusty Member
  • ***********
  • Eve
  • *
  • 1385
    • e-teoria
Re: V-Parser: developing an option towards NLP and some other thingies
« Reply #38 on: January 08, 2020, 06:06:38 AM »
I'm following your project with great interest Ivan and I'm looking forward to your updates.

Thank you Infurl :)

Is "sequential normal form" your own invention?

The part dealing with sequences is. But sequences are merely nameless predicates that anchor to the outer world by constants at arbitrary positions instead of by predicate names.
There exist some rules interwoven within this world. As much as it is a blessing, so much it is a curse.

*

ivan.moony

  • Trusty Member
  • ***********
  • Eve
  • *
  • 1385
    • e-teoria
Re: V-Parser: developing an option towards NLP and some other thingies
« Reply #39 on: January 22, 2020, 08:42:40 PM »
Another algorithm concept is done. This time it's extending context free grammar to standardized unrestricted grammar as "phrase pair structure grammar". Really simple addon - only 31 LOC. There is also a short article about Turing machines.

But "phrase logic structure grammar" algo is gone now. Will return soon as "generic logic structure grammar".
There exist some rules interwoven within this world. As much as it is a blessing, so much it is a curse.

*

infurl

  • Trusty Member
  • *********
  • Terminator
  • *
  • 765
  • Humans will disappoint you.
    • Home Page
Re: V-Parser: developing an option towards NLP and some other thingies
« Reply #40 on: January 22, 2020, 11:04:07 PM »
From your github documentation:
Quote
Definition of the grammars and their ability to develop rules towards input string is similar to the context-free grammars definition, except that the left side of rules may also be sequences of symbols.

That seems like the same thing as a context sensitive grammar which sits between context free grammars and recursively enumerable grammars in the Chomsky hierarchy. What's different about yours?

The trouble with context sensitive grammars is that parsing them is an intractable problem. Even human brains can't handle context sensitive grammars and very few natural languages have syntactic elements that could be considered context sensitive. (This shouldn't be confused with the semantics of natural languages which is another story altogether.) On the other hand, context free grammars can be parsed in polynomial time, within N^3 to be specific, and modern parsers such as GLL and GLR can achieve this quite comfortably.

The semantic parser that I've developed closely links semantics to syntax in such a way that parsing is accomplished through the definition of a context free grammar and interpretation is controlled by a bag of feature restrictions that direct the process efficiently. It has the expressive power of a restricted context sensitive grammar and the efficiency and speed of a context free grammar.

Compare this to things like the LR(k) family of grammars which define most programming languages. They aren't unrestricted context free grammars but they are more powerful than regular expressions. It's a compromise that made them practical with the more limited computer power that was available back then because you could parse them on computers that had mere kilobytes of memory.

One day I imagine there will be quantum computers programming each other using unrestricted context sensitive grammars but they will be beyond our comprehension and they are certainly beyond the capabilities of any of our current machines.

Keep up the good work Ivan, I believe you're on the right track there. At the very least, you're following the same line of reasoning that I did.

*

ivan.moony

  • Trusty Member
  • ***********
  • Eve
  • *
  • 1385
    • e-teoria
Re: V-Parser: developing an option towards NLP and some other thingies
« Reply #41 on: January 22, 2020, 11:54:05 PM »
From your github documentation:
Quote
Definition of the grammars and their ability to develop rules towards input string is similar to the context-free grammars definition, except that the left side of rules may also be sequences of symbols.

That seems like the same thing as a context sensitive grammar which sits between context free grammars and recursively enumerable grammars in the Chomsky hierarchy. What's different about yours?

Actually, at this point, it should represent exactly "recursively enumerable grammar" (or "Type-0" or "unrestricted grammar" in other words). Nothing new and different, I'm only surprised how minor was the algorithm change comparing to context free grammar one.

The trouble with context sensitive grammars is that parsing them is an intractable problem. Even human brains can't handle context sensitive grammars and very few natural languages have syntactic elements that could be considered context sensitive. (This shouldn't be confused with the semantics of natural languages which is another story altogether.)

Even programming in unrestricted grammar makes me dizzy as much as programming in assembler does. You might be interested in the future planned extensions:

  • 3.4. about the extensions introduced to context free grammars

    We may conclude that unrestricted grammars are somewhat lower-level positioned on a grammar types scale, just like assembler is very much lower-level positioned in a programming language types scale. With introduction of generic variables and moreover logical reasoning, we try to climb up the grammar types scale to achieve the similar effect that higher-level programming languages achieved comparing to assembler. But unlike assembler, which we don't have opportunity to upgrade from outside because it is based on strict hardware, unrestricted grammar may be easily extended from the outside by new properties because it represents a flexible software, which we have a chance to carefully adjust according to our requirements. The adjustments that we chose to realize finally lead us to *generic logic structure grammar* language, aiming for relative simplicity of use in a way similar to constructive theorem proving.

One day I imagine there will be quantum computers programming each other using unrestricted context sensitive grammars but they will be beyond our comprehension and they are certainly beyond the capabilities of any of our current machines.

Cool vision :)
There exist some rules interwoven within this world. As much as it is a blessing, so much it is a curse.

*

ivan.moony

  • Trusty Member
  • ***********
  • Eve
  • *
  • 1385
    • e-teoria
Re: V-Parser: developing an option towards NLP and some other thingies
« Reply #42 on: January 26, 2020, 09:05:02 AM »
Interested in parsing theory?

I conceptualized a parsing library for implementing Logos. Context free grammar is done (I built a novel algorithm for parsing it), but I want to extend it beyond unrestricted grammars. The extensions would be done over two axes: generic and logic, resulting in type (phrase: generic, structure:logic) grammars. But as I said, it's still merely a conceptual idea and still I need to implement it.

https://github.com/e-teoria/Esperas
There exist some rules interwoven within this world. As much as it is a blessing, so much it is a curse.

 


Users Online

12 Guests, 0 Users

Most Online Today: 22. Most Online Ever: 340 (March 26, 2019, 09:47:57 PM)

Articles