Print Page - Typed and Named Trees

Member's Experiments & Projects => General Project Discussion => Topic started by: Zero on October 18, 2015, 03:52:55 pm

Title: Typed and Named Trees
Post by: Zero on October 18, 2015, 03:52:55 pm

Hi all,

I was looking for a file format that would be somewhere between INI files (too simple) and XML (too... XMLish). Here is what I came up with. I call it TNT, short for Typed and Named Trees. It's heavily inspired from BBCode.

Code

  [a type = a label]

    types and labels can contain spaces
    labels can be identifiers or glob patterns, or anything
	
    [type only]
	
      labels are optional
      nesting is allowed and [inline] indentation doesn't matter [/inline]
	  
    [/type only]

    and we also have [empty tags/]
	
  [/a type]

It is less expressive than XML since we get rid of attributes. I know, TNT should not exist: we're supposed to follow standards, but I can't help it. :idiot2: :)

EDIT:

These .tnt files could be the building blocks of a simple thinkbot system.
- From AIML I want to keep the trigger/template idea with glob patterns and star-tags.
- Unlike chatbots, a thinkbot won't generate speech directly, it will generate thoughts instead. Thoughts call other thougths. Eventually, thoughts call action, like speech.
- For scripting, I'll use my-basic (https://github.com/paladin-t/my_basic) which has no square brackets in its syntax (and which feels like I'm 10 again), so user can insert star-tags inside code.
- The whole thing should be mounted on libuv (https://nikhilm.github.io/uvbook/introduction.html) for async and good IO.
- There should be a focus system, but I still don't know how. It has to be straightforward.

Title: Re: Typed and Named Trees
Post by: infurl on October 18, 2015, 10:56:38 pm

Quote from: Zero on October 18, 2015, 03:52:55 pm

I was looking for a file format that would be somewhere between INI files (too simple) and XML (too... XMLish). Here is what I came up with. I call it TNT, short for Typed and Named Trees. It's heavily inspired from BBCode.

I don't actually mind XML and do a lot of work with it, but if you want comparable power and a simple syntax then shouldn't you be looking at JSON which has achieved very wide-spread use?

Title: Re: Typed and Named Trees
Post by: Art on October 19, 2015, 12:49:37 am

Wow! For a moment there your Subject Title had me going back through my memory trying to recall the various Tree names like: Ash, Aspen, Basswood, Bayberry, Beech, Birch, Cedar, Chestnut, Dogwood, Elm, etc., etc.

Actually quite an exhaustive list...nothing to "bark" about! :2funny: Never mind...I'll Leaf you alone! ;)

Title: Re: Typed and Named Trees
Post by: Zero on October 19, 2015, 07:53:52 am

;D That's also what my search engine (https://startpage.com/) told me first!! God it's hard to create english names when you're not english! ::)

@infurl
Yes, there are a lot of formats already. In pure JSON, user would have to escape double-quotes (which are part of my-basic's syntax) and newlines. It wouldn't feel very fluent. There's also TOML (https://github.com/toml-lang/toml), which is very nice, but I don't like its double-square-brackets thing.

I wanted something simple... What I like here is that you can describe TNT in 10 lines!

EDIT:

Thanks for your input, infurl. You gave me this idea: for storing the TNT when I'll parse it, I'll use a json library (https://github.com/netmail-open/wjelement)! So, I'll have a ready-made data structure, and I can export it easily in json!

Title: Re: Typed and Named Trees
Post by: Zero on October 20, 2015, 07:31:50 am

The previous tnt sample would translate to the following JSON:

Code

  [
    {
      "istag" : TRUE,
      "type" : "a type",
      "label" : "a label",
      "content" :
      [
        {
          "istag" : FALSE,
          "text" : "types and labels can contain spaces \n labels can be identifiers or glob patterns, or anything"
        },
        {
          "istag" : TRUE,
          "type : "type only",
          "label" : FALSE,
          "content" :
          [
            {
              "istag" : FALSE,
              "text" : "labels are optional \n nesting is allowed and"
            },
            {
              "istag" : TRUE,
              "type" : "inline",
              "label" : FALSE,
              "content" :
              [
                {
                  "istag" : FALSE,
                  "text" : "indentation doesn't matter"
                }
              ]
            }
          ]
        },
        {
          "istag" : FALSE,
          "text" : "and we also have"
        },
        {
          "istag" : TRUE,
          "type" : "empty tags",
          "label" : FALSE,
          "content" : FALSE
        }
      ]
    }
  ]

EDIT: There should be support for a few useful specific syntaxes inside certain tags, like string lists, n-ary relations, an RDF syntax, ...etc.

Title: Re: Typed and Named Trees
Post by: Zero on October 23, 2015, 09:10:20 am

Hi!

In PEGjs (http://pegjs.org/online), the following grammar...

Code


Content
= (Element / Text)*

Element
= startTag:sTag content:Content endTag:eTag {
    if (startTag.type != endTag.type) {
      throw new Error(
        "Expected [/" + startTag + "] but [/" + endTag + "] found."
      );
    }
    return {
      istag:   startTag.istag,
      type:    startTag.type,
      label:   startTag.label,
      content: content
    };
  }
/ startTag:selfTag {
    return {
      istag:   startTag.istag,
      type:    startTag.type,
      label:   startTag.label,
    };
  }

sTag 
= [ \n\t]* "[" type:TagType "]" [ \n\t]* { return {istag:true,type:type,label:false}; }
/ [ \n\t]* "[" type:TagType "=" lab:label "]" [ \n\t]* { return {istag:true,type:type,label:lab}; }

label
= chars:[^/\[\]=]*  {return chars.join("").trim();}

selfTag
= [ \n\t]* "[" type:TagType "/]" [ \n\t]* { return {istag:true,type:type,label:false}; }
/ [ \n\t]* "[" type:TagType "=" lab:label "/]" [ \n\t]* { return {istag:true,type:type,label:lab}; }

eTag
= [ \n\t]* "[/" type:TagType "]" [ \n\t]* { return {type:type}; }

TagType
= chars:[a-zA-Z0-9 ]+ { return chars.join("").trim(); }

Text
= chars:([^\[\]])+ { return {istag:false,text:chars.join("").replace(/[ \t]+/g, ' ').replace(/\n+/g,'\n').trim()};}

...will translate this...

Code


  [a type = a label]

    types and labels can contain spaces
	labels can be identifiers or glob patterns, or anything
	
    [type only]
	
	  labels are optional
	  nesting is allowed and [inline] indentation doesn't matter [/inline]
	  
    [/type only]

	and we also have [empty tags/]
	
  [/a type]

...to this...

Code

[
   {
      "istag": true,
      "type": "a type",
      "label": "a label",
      "content": [
         {
            "istag": false,
            "text": "types and labels can contain spaces
 labels can be identifiers or glob patterns, or anything"
         },
         {
            "istag": true,
            "type": "type only",
            "label": false,
            "content": [
               {
                  "istag": false,
                  "text": "labels are optional
 nesting is allowed and"
               },
               {
                  "istag": true,
                  "type": "inline",
                  "label": false,
                  "content": [
                     {
                        "istag": false,
                        "text": "indentation doesn't matter"
                     }
                  ]
               }
            ]
         },
         {
            "istag": false,
            "text": "and we also have"
         },
         {
            "istag": true,
            "type": "empty tags",
            "label": false
         }
      ]
   }
]

Ai Dreams Forum

Member's Experiments & Projects => General Project Discussion => Topic started by: Zero on October 18, 2015, 03:52:55 pm