... Returning to differ <= and <- operators: less mess for thinking when using it. Also returning to defining new symbols without prefix and existing symbols with "@" prefix: looks more clean and readable.
I had an itch when using Regexp with javascript parsing, there's no way to efficiently check does an regexp match the value at specific offset in a string. Splicing string on every check was not my option, so I decided to program my own Regexp library. I followed some documentation and got this implementation:
RegExp <= (
Union <= (@SimpleRE, '|', @RegExp) |
SimpleRE <= (
Concatenation <= (@BasicRE, @SimpleRE) |
BasicRE <= (
OneOrMore <= (@ElementaryRE, ('+?' | '+')) |
ZeroOrMore <= (@ElementaryRE, ('*?' | '*')) |
ZeroOrOne <= (@ElementaryRE, '?') |
NumberedTimes <= (
'{',
In <= (
Exactly <= @Integer |
AtLeast <= (@Integer, ',') |
AtLeastNotMore <= (@Integer ',', Integer)
),
('}?' | '}')
) |
ElementaryRE <= (
Group <= ('(', @RegExp, ')' |
Any <= '.' |
Eos <= '$' |
Bos <= '^' |
Char <= (
@NonMetaCharacter |
'\\', (
@MetaCharacter |
't' | 'n' | 'r' | 'f' | 'd' | 'D' | 's' | 'S' | 'w' | 'W' |
@Digit, @Digit, @Digit
)
) |
Set <= (
PositiveSet <= ('[', @SetItems, ']') |
NegativeSet <= ('[^', @SetItems, ']')
) <~ (
SetItems <= (
SetItem <= (
Range <= (@Char, '-', @Char) |
@Char
) |
@SetItem, @SetItems
)
)
)
)
)
)
It would work with some javascript back-end, but when I compared "union" to "set" in Regexp definition, I concluded they are about the same thing, a choice of values detected at parse time. Didn't like this redundancy, so I decided to slightly change the definition of Regexp and to develop my own version of it which looks like this:
ChExp <= (
Choice <= (@ConExp, '|', @ChExp) |
ConExp <= (
Concatenation <= (@WExp, @ConExp) |
WExp <= (
Without <= (QExp, '!', @WExp) |
QExp <= (
OneOrMore <= (@GExp, '+') |
ZeroOrMore <= (@GExp, '*') |
ZeroOrOne <= (@GExp, '?') |
NumberedTimes <= (@GExp, '{', @Integer, '}') |
GExp <= (
Group <= ('(', @ChExp, ')') |
Exp <= (
Any <= '.' |
Range <= (@Char, '-', @Char) |
Char <= (
@NonMetaCharacter |
'\\', (
@MetaCharacter |
't' | 'n' | 'r' | 'f' |
'0x', @HEXDigit, @HEXDigit, @HEXDigit, @HEXDigit, @HEXDigit, @HEXDigit
)
)
)
)
)
)
)
While implementing extra "without" operator to cover negative set from original Regexp, the new Regexp version is more expressive than original one. For example, the expression "(.*)!((keyword1)|(keyword2))" matches any size string that is different from "keyword1" and "keyword2". In regular Regexp it is possible only to exclude specific character from matching, while I've got exclusion of the whole string. The new definition looks more clean and it does not suffer from choice redundancy.
I have to say, probably structured way of defining grammar in Metafigure saved me from pitfalls which original authors of Regexp had in the seventies when they probably used unstructured BNF. I'm kind of proud at Metafigure, it got me a better version of Regexp already, and it is not even finished yet.
Otherwise, I already started to program Metafigure in Javascript, and I plan the crippled version 0.2 soon, which should be sufficient to parse English texts (yes, it is the very NLP - among other stuff - I'm working on).