GLR Parser

  • 22 Replies
  • 8575 Views
*

infurl

  • Administrator
  • ***********
  • Eve
  • *
  • 1298
  • Humans will disappoint you.
    • Home Page
Re: GLR Parser
« Reply #15 on: August 18, 2011, 05:44:19 pm »
I've been doing some stress testing of my GLR parser library. One of the files that I'm using is the Project Gutenberg catalog in RDF format. It is about 210 megabytes decompressed, and it takes my parser about 8 minutes to process it using a single core on my work station. That compares favorably with the time that it takes xsltproc to process it with an XSLT script (about 7 minutes) but falls far short of the time that it takes a compiled flex program to scan it (about 7 seconds).

Lexical scanners like flex only accept regular expressions which are much less expressive than context free grammars, however they have the advantage that it takes relatively little processing power to handle them. It would be neat if it was possible to recognise grammars, or parts of grammars, that could be handled by a regular expression parser instead of a full context free grammar parser. Could get the best of both worlds then.

*

mendicott

  • Bumblebee
  • **
  • 28
    • Meta-Guide.com
Re: GLR Parser
« Reply #16 on: August 18, 2011, 06:15:31 pm »
> http://en.wikipedia.org/wiki/Parsing_expression_grammar

How would "Parsing expression grammar" relate (or not relate) to this?

So, what is needed is a cloud-based API to translate between (<=>) Regex and "Phrase structure grammar" ?!?

If so, what would that look like, can you give a braindead simple example??

*

Bragi

  • Trusty Member
  • ********
  • Replicant
  • *
  • 564
    • Neural network design blog
Re: GLR Parser
« Reply #17 on: August 19, 2011, 07:20:21 am »
Quote
Lexical scanners like flex only accept regular expressions which are much less expressive than context free grammars
I think you are a bit confused. Flex doesn't use regular expressions, it's a lexical scanner. It's fast because it doesn't do the full parse, not because it uses regular expressions, that would make it a lot slower.
Flex is what you use to create the data for bizon or yacc.

*

infurl

  • Administrator
  • ***********
  • Eve
  • *
  • 1298
  • Humans will disappoint you.
    • Home Page
Re: GLR Parser
« Reply #18 on: August 19, 2011, 07:33:38 am »
Quote
Lexical scanners like flex only accept regular expressions which are much less expressive than context free grammars
I think you are a bit confused. Flex doesn't use regular expressions, it's a lexical scanner. It's fast because it doesn't do the full parse, not because it uses regular expressions, that would make it a lot slower.
Flex is what you use to create the data for bizon or yacc.

From the flex lexical scanner manual http://flex.sourceforge.net/manual/Introduction.html#Introduction

Quote
flex is a tool for generating scanners. A scanner is a program which recognizes lexical patterns in text. The flex program reads the given input files, or its standard input if no file names are given, for a description of a scanner to generate. The description is in the form of pairs of regular expressions and C code, called rules. flex generates as output a C source file, lex.yy.c by default, which defines a routine yylex(). This file can be compiled and linked with the flex runtime library to produce an executable. When the executable is run, it analyzes its input for occurrences of the regular expressions. Whenever it finds one, it executes the corresponding C code.

Jan, you did actually say something intelligent last week, I remember because it was so out of character for you, but mostly you just talk rubbish. However I'll give you some credit this time for at least realising that I wasn't talking about the product that Adobe has which is also called Flex.

*

Bragi

  • Trusty Member
  • ********
  • Replicant
  • *
  • 564
    • Neural network design blog
Re: GLR Parser
« Reply #19 on: August 19, 2011, 08:23:57 am »
My bad, I had the flex def confused with that of coco. No need to be insulting though

*

infurl

  • Administrator
  • ***********
  • Eve
  • *
  • 1298
  • Humans will disappoint you.
    • Home Page
Re: GLR Parser
« Reply #20 on: August 19, 2011, 08:30:29 am »
My bad, I had the flex def confused with that of coco. No need to be insulting though

Accusing me of being confused isn't insulting? Maybe the word has different connotations in different cultures, but more often it is a way of insulting someone in such a way as to be able to feign innocence when challenged. Even if your command of English isn't good enough for you to know that (and I believe it is), it wouldn't be the first time that you tinged your comments with a little bit of spite would it. :)

*

Bragi

  • Trusty Member
  • ********
  • Replicant
  • *
  • 564
    • Neural network design blog
Re: GLR Parser
« Reply #21 on: August 19, 2011, 08:51:32 am »
Ah, I think there we have the culture difference.  It's a statement that clearly bothers you, my apologies for that. It certainly wasn't meant in the way that you took it. Maybe you haven't noticed before, but I often am very confused (up to the point that I sometimes wonder what the hell I am doing in room x while I need to be in y). People say it to me all the time (it's not considered offensive for us, especially in my family since we all share the trait). I simply, in this case incorrectly, assumed that you were also confused. Turned out it was me again.

*

infurl

  • Administrator
  • ***********
  • Eve
  • *
  • 1298
  • Humans will disappoint you.
    • Home Page
Re: GLR Parser
« Reply #22 on: August 23, 2011, 11:33:21 am »
I've begun the task of translating my GLR parser generator from Common Lisp to C. So far I've rewritten the code that parses the grammar definition file and builds the data structures for it. The Common Lisp code that does this is entirely contained in the following three function definitions while the C version is more than 400 lines just for the data structure definitions and functions. Strictly speaking I should also include the 1800 lines of C for the parser driver and the 500 lines for the compiled parser definitions, although this part could probably be rewritten in a few hundred lines as a simple recursive descent parser instead.

Still, that's 23 lines of Common Lisp versus more than 3000 lines of C. The performance, portability, and scalability gains, if any, had better be worth the effort in the long run.  :uglystupid2:

Code
(defun read-grammar-name (istream)
  (let ((item (read istream nil)))
    (when item
      (etypecase item
        (number item)
        (keyword item)
        (symbol (symbol-name item))))))

(defun read-grammar-rule (istring)
  (when istring
    (with-input-from-string (istream istring)
      (loop :for token = (read-grammar-name istream)
            :while token :unless (keywordp token) :collect token))))

(defun read-grammar-file (file-name)
  (let ((*readtable* (copy-readtable nil)))
    (setf (readtable-case *readtable*) :preserve)
    (with-open-file (istream file-name :direction :input)
      (loop :for line = (read-line istream nil)
            :for rule = (read-grammar-rule line)
            :while line :when rule :collect rule))))