Context Free Grammar

  • 10 Replies
  • 5266 Views
*

infurl

  • Administrator
  • ***********
  • Eve
  • *
  • 1298
  • Humans will disappoint you.
    • Home Page
Context Free Grammar
« on: July 30, 2011, 02:51:43 am »
This is the grammar definition for the grammar definitions that I am using in my parsing software. It is called "Chomsky Normal Form" and as simple as it is, it is capable of describing most formal and natural languages. Although it might look a bit cumbersome when used for trivial definitions, it's power rapidly becomes apparent as you start trying to describe more complex languages.

There are a number of short-hand variations being widely used, such as "Backus-Naur Form" or BNF which everyone reading this would have seen somewhere already, even if you didn't know what it was; "Augmented Backus-Naur Form" or ABNF which is used in all the RFC documents that define the internet protocols; and "Extended Backus-Naur Form" or EBNF which is used in compiler compilers such as bison and ANTLR. Mostly all these do is provide a means of expressing repeating and optional elements more succinctly, various kinds of literal strings, and a means of associating executable code with parsed elements.

The following file is not a complete grammar. I normally use it in conjunction with another file which defines names for all the different ASCII and UNICODE characters. It is long, highly repetitive, and easily constructed from standard character tables but I can post it if anyone feels that they need to see it.


; CONTEXT FREE GRAMMAR ---------------------------------------------------------

Grammar := rule_list rule_separator
Grammar := rule_separator rule_list rule_separator

rule_list := Rule
rule_list := rule_list rule_separator Rule

Rule := RuleName defined_as RuleBody

defined_as := char_colon char_equals
defined_as := char_colon char_equals white_space
defined_as := white_space char_colon char_equals
defined_as := white_space char_colon char_equals white_space

RuleName := letter
RuleName := letter rule_name_char_list

rule_name_char_list := rule_name_char
rule_name_char_list := rule_name_char_list rule_name_char

rule_name_char := letter
rule_name_char := digit
rule_name_char := char_underscore

RuleBody := rule_item_list

rule_item_list := rule_item
rule_item_list := rule_item_list white_space rule_item

rule_item := RuleName
rule_item := CharCode

CharCode := digit_list

digit_list := digit
digit_list := digit_list digit

rule_separator := end_of_line
rule_separator := line_comment
rule_separator := rule_separator end_of_line
rule_separator := rule_separator line_comment

line_comment := char_semicolon end_of_line
line_comment := char_semicolon line_comment_char_list end_of_line

line_comment_char_list := line_comment_char
line_comment_char_list := line_comment_char_list line_comment_char

line_comment_char := alphanumeric
line_comment_char := punctuation
line_comment_char := white_space


;-------------------------------------------------------------------------------

*

Bragi

  • Trusty Member
  • ********
  • Replicant
  • *
  • 564
    • Neural network design blog
Re: Context Free Grammar
« Reply #1 on: July 30, 2011, 09:42:43 am »
I did something similar a long time ago. I started out with coco/r, which is basically an EBNF. from there, I made my own, general purpose (OO)/compiler generator/query language (called gene: http://users.telenet.be/GeneCompiler/ ).
Do yo plan to write a parser for the complete english language with this?

*

infurl

  • Administrator
  • ***********
  • Eve
  • *
  • 1298
  • Humans will disappoint you.
    • Home Page
Re: Context Free Grammar
« Reply #2 on: July 30, 2011, 10:30:25 am »
Do you plan to write a parser for the complete english language with this?

Yes, that is my intention. There have already been three open source projects (that I know of) which have published English grammars in various forms and levels of completeness:

Link Grammar
The Xtag Project
English Resource Grammar and Lexicon

ERG is fairly complete and broad coverage but is lacking an efficient parser implementation. It has been written using Typed Feature Structures which I believe I will be able to translate into a simpler form (e.g. the one that I described earlier). It currently comprises almost 47,000 grammar rules so it will be interesting to see how that turns out.

The typed feature structures in ERG are expressed in a dialect known as Type Definition Language (TDL) and I have already implemented a complete context free grammar definition for that language which I can compile for my GLR parser. That's the software that I mentioned that takes about 10 seconds to translate the entire 7 megabytes of ERG text files into 3 million lines of XML.

Did I mention that I have been working on this software for about four years? Well that's all I've got to show for it so far.

*

Bragi

  • Trusty Member
  • ********
  • Replicant
  • *
  • 564
    • Neural network design blog
Re: Context Free Grammar
« Reply #3 on: July 30, 2011, 03:51:30 pm »
4 years, that's no joke. Hope you enjoy the work.

*

mendicott

  • Bumblebee
  • **
  • 28
    • Meta-Guide.com
Re: Context Free Grammar
« Reply #4 on: July 30, 2011, 04:07:48 pm »
Andrew, thanks for this I'm finding it interesting and helpful.

Could you briefly explain the bigger picture to me?  For example, what comes before your GLR parser, and what comes after in terms of workflow??  In fact, how do you foresee it being used?  Are you working on any conversational system yourself?
« Last Edit: July 31, 2011, 08:41:31 pm by mendicott »

*

Bragi

  • Trusty Member
  • ********
  • Replicant
  • *
  • 564
    • Neural network design blog
Re: Context Free Grammar
« Reply #5 on: July 30, 2011, 04:12:03 pm »
I have another question as well, a bit in the same lines as Marcus:
to parse the contents of the sentences, how are you planning on doing this: at the level of 'verb, noun, adj, article,...' or the actual words, 'a, be, do, letter,..'?

*

infurl

  • Administrator
  • ***********
  • Eve
  • *
  • 1298
  • Humans will disappoint you.
    • Home Page
Re: Context Free Grammar
« Reply #6 on: July 31, 2011, 04:08:50 am »
This morning I finished developing the first draft of a context free grammar for sentence splitting and rudimentary discourse analysis.

The grammar is here http://asmith.id.au/files/eng.grammar
The test input file is here http://asmith.id.au/files/the-country-of-the-blind.text
The marked up output file is here http://asmith.id.au/files/the-country-of-the-blind.xml

*

mendicott

  • Bumblebee
  • **
  • 28
    • Meta-Guide.com
Re: Context Free Grammar
« Reply #7 on: July 31, 2011, 08:56:35 pm »
The result looks beautiful to me!

Will it be able to go into or out of ePub format http://en.wikipedia.org/wiki/EPUB ?

And, what about RDFa compatibility http://en.wikipedia.org/wiki/RDFa ?

I would still like to know where its all going?  ;^)

*

infurl

  • Administrator
  • ***********
  • Eve
  • *
  • 1298
  • Humans will disappoint you.
    • Home Page
Re: Context Free Grammar
« Reply #8 on: August 01, 2011, 02:51:46 am »
@Jan, @Marcus

The conversation that I've been having with AndyHo on ChatBots.Org under the topic "Intelligent Behavior" should answer at least some of your questions.

http://www.chatbots.org/ai_zone/viewreply/6271/

Like me, Andy has sought to achieve the "state of the art" according to the published literature before branching out into new territory, and he has a few years head start on me as well, however I expect that from this point on we'll be heading in different directions with our work. I have some ideas of my own which I am now in a position to test, but I have been in this business too long to be willing to talk about vaporware. ;)
« Last Edit: August 01, 2011, 02:58:15 am by infurl »

*

infurl

  • Administrator
  • ***********
  • Eve
  • *
  • 1298
  • Humans will disappoint you.
    • Home Page
Re: Context Free Grammar
« Reply #9 on: August 01, 2011, 03:44:04 am »
The result looks beautiful to me!
Will it be able to go into or out of ePub format http://en.wikipedia.org/wiki/EPUB ?
And, what about RDFa compatibility http://en.wikipedia.org/wiki/RDFa ?
I would still like to know where its all going?  ;^)

By the way Marcus, in case it wasn't obvious to you, if you combine the files that I provided, common.grammar and eng.grammer, you have the complete definition for the parser that I used to produce the-country-of-the-blind.xml and it should work equally well with any plain text file which has been formatted similarly to the-country-of-the-blind.text (i.e. most of the documents on Project Gutenberg).

Modifying the syntax of my grammar definition files so they will work with commonly available GLR parsing capable tools like bison or DParser should be trivial, though as I haven't tested this myself yet, I don't know if they will work better or worse than my own parser software. If you try this, please let me know how it goes, and if you do gain any fame or fortune out of it, I hope you will be willing to share. ;)

*

infurl

  • Administrator
  • ***********
  • Eve
  • *
  • 1298
  • Humans will disappoint you.
    • Home Page
Re: Context Free Grammar
« Reply #10 on: August 11, 2011, 03:49:36 am »
I have another question as well, a bit in the same lines as Marcus:
to parse the contents of the sentences, how are you planning on doing this: at the level of 'verb, noun, adj, article,...' or the actual words, 'a, be, do, letter,..'?

I posted a more detailed answer to this question in the chatbots forum this morning, since there still seem to be a few people who aren't "getting it" yet.

http://www.chatbots.org/ai_zone/viewreply/6467/