Ahead Network Parser: developing an option towards NLP and some other thingies

  • 32 Replies
  • 2346 Views
*

ivan.moony

  • Trusty Member
  • *********
  • Terminator
  • *
  • 778
  • look, a star is falling
    • Structured Type System
Really liking your work Ivan…
Thank you, but I should take these preliminary results with reserve. Looks cool for now, but it's not fully tested yet. :-\

Search speed is finite (GHz) storage is plentiful (TB).
For writing each byte, some time is spent. More there are bytes, more time is needed for their writing.
Wherever you see a nice spot, plant another knowledge tree :favicon:

*

Zero

  • Trusty Member
  • *******
  • Starship Trooper
  • *
  • 433
  • Fictional character
    • SYN CON DEV LOG
Ok I see the point, thanks.
I think Korrelan is talking about memoization.

Hey, thinking about it, chatbots are like big memoized things... but let's not drift away from parsers!

*

ivan.moony

  • Trusty Member
  • *********
  • Terminator
  • *
  • 778
  • look, a star is falling
    • Structured Type System
Well, this is something that works too (tested in practice), but it is not linear. The speed is about the same as in graphs already published in this thread:
Code: [Select]
PROCEDURE Parse (grammar, text)
    DECLARE chart: [];
    Populate (chart[0], grammar.TOP_RULE, 0, []);
    FOR each nonempty column in chart  // chart can expand during this loop
        FOR each new item in column  // column can expand during this loop
            IF item.Rule is not terminal
                FOR each alternation of item.rule
                    Populate (column, alternation.sequence, 0, [item]);
            ELSE
                IF item.Rule.SCAN_TERMINAL (column.index, text)
                    item.Aheads.ADD_IF_NOT_EXIST (chart[item.RIGHT_EXTENT]);
                    FOR each occurence in item.Occurences
                        Populate (chart[item.RIGHT_EXTENT], occurence.sequence, occurence.index + 1, occurence.Parents);

    RETURN chart;

PROCEDURE Populate (array, sequence, index, parents)
    rule ← sequence[Index];
    item ← array.FIND (rule);
    IF not found item
        item ← NEW {Rule: rule, Occurences: [], Inheritors: [], Aheads:[]};
        array.ADD (item);

    IF Sequence.LENGTH > Index + 1
        AddOccurence (item, sequence, index, parents);

    ELSE
        FOR each parent in parents
            parent.Inheritors.ADD (item);
            FOR each toInherit in parent.Occurences
                AddOccurence (item, toInherit.Sequence, toInherit.Index, toInherit.Parents);

PROCEDURE AddOccurence (item, sequence, index, parents)
    occurence ← item.Occurences.FIND (sequence, index);
    IF not found occurence
        occurence ← NEW {Sequence: sequence, Index: index, Parents: []};
        item.Occurences.ADD (occurence);

    FOR each parent in parents
        occurence.Parents.ADD_IF_NOT_EXIST (parent);

    FOR each inheritor in item.Inheritors
        AddOccurence (inheritor, occurence.Sequence, occurence.Index, occurence.Parents);

    FOR each column in item.Aheads
        Populate (column, occurence.Sequence, occurence.Index + 1, occurence.Parents);

I thought of just adding a few lines to merge parents of the same right extent to make the parsing linear, but I'm onto implementing new (left-align)<->(right-align) philosophy because I don't like combining several different philosophies, using only fragments of them where appropriate, it doesn't feel right. I have at least two different algorithms by now (both cubic, I guess), and I wonder how many parsing algorithms are even possible, and which of them could be linear...
Wherever you see a nice spot, plant another knowledge tree :favicon: