Ai Dreams Forum

Member's Experiments & Projects => AI Programming => Topic started by: Freddy on January 03, 2017, 04:51:53 am

Title: Looking for simple word type lists
Post by: Freddy on January 03, 2017, 04:51:53 am
I'm working on something that takes a sentence and figures out what each type of word is, ie; verb, noun, preposition etc.

I managed to find a few sources online, got about 5000 words to play with at the moment. All I need is a straight list, like a list of verbs or nouns. I store them separately at the moment.

I was using Wordnik API to find out what type of words they were, but it's a bit slow doing it that way. If anyone knows of anything like I describe, could they share please ?

Thanks. :)
Title: Re: Looking for simple word type lists
Post by: infurl on January 03, 2017, 08:11:29 am
There are two resources online that have what you are looking for.

MOBY is in a fairly simple format and would be the easiest one for you to use. The part-of-speech file has 230,000 words in it. The files use some strange character encodings but if you don't care about foreign words and diacritical marks you can treat them as ASCII.

http://icon.shef.ac.uk/Moby/ (http://icon.shef.ac.uk/Moby/)

Another good (better) resource is WordNet. The data files are in a much more complicated format but there are plenty of utilities for using them.

https://wordnet.princeton.edu/ (https://wordnet.princeton.edu/)

I've performed in depth analyses on both these resources (among many others) and converted them into relational database formats so if you tell me what would be the most convenient format for you I could generate it in a jiffy.

Of course I hope you realise that trying to tag words in sentences from lists is futile because the same word can be a different part of speech depending on where it is in the sentence. This is my favorite topic and I could go on about it for hours, but I'll stop right there.
Title: Re: Looking for simple word type lists
Post by: infurl on January 03, 2017, 08:16:49 am
Here's a summary of the current contents of my lexical database.
 
Code
Adjective     |  70409
Adverb        |  15745
Conjunction   |    138
Determinative |    103
Interjection  |    641
Noun          | 288753
Preposition   |    268
Verb          |  57245
Title: Re: Looking for simple word type lists
Post by: Korrelan on January 03, 2017, 08:52:30 am
Woah… That’s an excellent resource.

I've parsed the Wordnet ANSI version into my own format and I’m currently linking/ cross referencing with a large phoneme database (Sphinx). The word descriptions will come in very handy too.

Weird… that ‘Battle of Britain’ is listed as a noun though…

I was considering writing another simple Chatbot engine as a side project… this will come in very handy… Cheers.

 :)
Title: Re: Looking for simple word type lists
Post by: Freddy on January 03, 2017, 06:01:41 pm
Thanks Infurl, I'll take a look at those.

Yes I did realise it was somewhat futile, but I'm just experimenting at the moment, what I am doing is pretty simple. I've actually learned more about language in the past 24 hours than I think I ever did when at school.

Will get back to you if I need anything :)

Quote
Weird… that ‘Battle of Britain’ is listed as a noun though…

Yes this was the trouble I ran into with some other online resources. The way I have been doing it is not to look at the phrase in whole, but rather as individual words. So that would be :

noun + preposition + noun

I think this is why a lot of those lists I found are so long - because they include things like that. If I look at the words separately it is enough for me to decide what to do with them and probably quicker.
Title: Re: Looking for simple word type lists
Post by: Art on January 03, 2017, 07:57:23 pm
http://www.sequencepublishing.com/1/thesage.html (http://www.sequencepublishing.com/1/thesage.html)

Freddy,

Give it a try. I keep mine on the taskbar for those "how was that again?" moments. ;)

Free or $10 for Pro version.
Title: Re: Looking for simple word type lists
Post by: Freddy on January 04, 2017, 05:20:53 am
Thanks for the tip Art, but I was after something for programming purposes rather than an accessory.

I used Wordnet in the end and built a parser in PHP so I can load their files into a MYSQL database. Playing with strings and parsing are some of my favourite things.
Title: Re: Looking for simple word type lists
Post by: Freddy on January 04, 2017, 11:28:32 pm
I got it all into MYSQL, for anyone interested in how Wordnet breaks down, this is what I pulled from their database files.

Adjectives 16340
Adverbs 572
Nouns 82190
Prepositions 148   
Verbs 13789

The prepositions are my addition from another source.

Over 100,000 words should be enough for me to play with.

The PHP parser I built to extract it all, processes everything and inserts the data into the database in under 10 seconds  8)
Title: Re: Looking for simple word type lists
Post by: infurl on January 04, 2017, 11:54:08 pm
That seems a little slow but... PHP

Presumably you are preserving all the hierarchical relationships between the different senses and sets of synonyms and included the glossaries and verb frames as well. The actual word lists don't include any inflections either, but no doubt you found a clever way to generate all your comparative, and superlative adjectives, plural nouns, and gerund participles, past participles, preterites and third person singular verbs using other means.  O0
Title: Re: Looking for simple word type lists
Post by: Freddy on January 05, 2017, 12:01:26 am
PHP was just the path of least resistance as I've done a lot of coding in it. It also has a lot of useful string handling functions.

I didn't preserve the relationships at the moment, for now my needs are simple. I did preserve synonyms though. I had already written some routines to make singulars and plurals. So I can use them with this.
Title: Re: Looking for simple word type lists
Post by: infurl on January 05, 2017, 12:04:13 am
You might find some of these word lists useful too.

http://wordlist.aspell.net/other/ (http://wordlist.aspell.net/other/)

You won't get far without inflections for verbs and adjectives.
Title: Re: Looking for simple word type lists
Post by: Don Patrick on January 05, 2017, 09:50:11 am
Thanks for the Moby list, Infurl. I can use the list of intransitive verbs for my output (asking "What" questions with verbs that don't take an object gets awkward).

I think most word lists are bloated with verb tenses and compound words. I do find part of speech categories somewhat useful in combination with syntactical restrictions. A lot of words like "program" can be a noun or verb, but only one of those when it's preceded by "the". Of course programming all those restrictions is a downright mess and probably better delegated to already existing parsers.
Title: Re: Looking for simple word type lists
Post by: infurl on January 05, 2017, 07:58:26 pm
That's great @Don. I hope those resources will help everybody.

Unfortunately the distinction between transitive and intransitive verbs isn't really sufficient as there are five types altogether (intransitive, complex intransitive, monotransitive, complex transitive and ditransitive). The complex variants allow for optional adjuncts (modifiers e.g. "on Wednesday") as distinct from the mandatory complements (subject, object and indirect object). Luckily VerbNet has enough really detailed information to fill in all the blanks but it is very messy.

I've converted the entire VerbNet XML database into a very convenient relational database and with a bit more effort will have rendered the whole thing into a very nice grammar definition. With the right "grammar language" (one which supports feature constraints) matching up verbs and prepositions isn't at all messy but it's a necessary step towards figuring out which syntactic items (subject, object, indirect object) become which thematic roles (agent, patient, instrument etc) which in turn is a requirement for semantic (deep) parsing.
Title: Re: Looking for simple word type lists
Post by: Don Patrick on January 06, 2017, 10:20:00 am
My syntactical restraints are lists of if-then rules instead of a neat grammar language because there seemed to be too many exceptions for a consistent template (Grammar may be consistent but people aren't). I also designed them for learning new words on the fly instead of relying on a database that already has all the answers. It isn't hard to machine-learn verbs that have direct objects in texts, but reversely, the absence of direct objects doesn't automatically mean the verb is intransitive, so I can better use a list for those cases.

Ideally I would use VerbNet if I could make heads and tails of it. In VerbNet's format, how can I tell whether ARG1(?) indicates a direct object? There seem to be more roles than "Patient". For now I only need to distinguish verbs that don't fit the question "What/who do you verb?".
Quote
nw/wsj/01/wsj_0105.parse 18 37 gold rob-v 10.6 Robbery rob.01 2 ----- 35:1-ARG1=Source;Victim 37:0-rel

nw/wsj/01/wsj_0105.parse 18 39 gold murder-v 42.1 Killing murder.01 1 ----- 35:1*40:1-ARG1=Patient;Victim 39:0-rel
Title: Re: Looking for simple word type lists
Post by: infurl on January 07, 2017, 09:17:46 am
VerbNet is very complicated and it took quite a bit of effort to unravel it, but I think it was worth it. They have organised it to be as concise as possible, but it makes it a lot more difficult to decode. Verb Classes can have subclasses which add more members, roles and frames to them. Roles in subclasses can override roles in base classes. Roles can be restricted by selection criteria and syntax elements can also have restrictions placed on them. Converting it all into a relational database made it all a lot easier to understand and use and while I was at it, I converted all the logical restrictions to conjunctive normal form which means they can be used directly in grammar rules.

Once you put it all back together you get about 2500 different frames like the following examples, from which it is comparatively easy to pinpoint the sense of the verb, and which noun phrases become which thematic role. Yes, there are a lot of different thematic roles and they are also organised in an inheritance hierarchy. The excitement never ends.

Code
-[ RECORD 1 ]--------------------------------------------
example | Amanda shoved the box.
item1   | {NP,Agent,+int_control}
item2   | {VERB}
item3   | {NP,Theme,+concrete}

-[ RECORD 2 ]--------------------------------------------
example | Amanda shoved the box from the corner.
item1   | {NP,Agent,+int_control}
item2   | {VERB}
item3   | {NP,Theme,+concrete}
item4   | {PREP,+src}
item5   | {NP,Initial_Location,+location}

-[ RECORD 3 ]--------------------------------------------
example | Amanda shoved the box to John.
item1   | {NP,Agent,+int_control}
item2   | {VERB}
item3   | {NP,Theme,+concrete}
item4   | {PREP,"to towards"}
item5   | {NP,Destination}

-[ RECORD 4 ]--------------------------------------------
example | Amanda shoved the box from the corner to John.
item1   | {NP,Agent,+int_control}
item2   | {VERB}
item3   | {NP,Theme,+concrete}
item4   | {PREP,+src}
item5   | {NP,Initial_Location,+location}
item6   | {PREP,"to towards"}
item7   | {NP,Destination}

-[ RECORD 5 ]--------------------------------------------
example | Amanda shoved the box to John from the corner.
item1   | {NP,Agent,+int_control}
item2   | {VERB}
item3   | {NP,Theme,+concrete}
item4   | {PREP,"to towards"}
item5   | {NP,Destination}
item6   | {PREP,+src}
item7   | {NP,Initial_Location,+location}
Title: Re: Looking for simple word type lists
Post by: infurl on January 07, 2017, 11:09:38 pm
Here's a version of VerbNet which is completely transformed into a much easier to understand format. The entire class hierarchy is flattened out so that properties that were shared among subclasses are now duplicated into them, and the structure is much simpler and I hope easier to understand. Each item should be self evident and not require cross referencing things all over the place. Also, everything is one file. Please feel free to ask if you need further explanation, maybe I'll be able to make it even easier to use.
Title: Re: Looking for simple word type lists
Post by: Don Patrick on January 08, 2017, 08:59:26 am
Thank you! This is much clearer with the "NP V" information in it.
Title: Re: Looking for simple word type lists
Post by: Freddy on January 08, 2017, 03:11:40 pm
Thanks very much :)
Title: Re: Looking for simple word type lists
Post by: Freddy on July 09, 2017, 12:27:41 am
Here's a good list of dictionaries and things for anyone else.

I'm going to play with WordsAPI (2,500 free request per day free)  and also Oxford that allow 3000 requests a month for free.

https://www.programmableweb.com/category/dictionary/api
Title: Re: Looking for simple word type lists
Post by: Freddy on July 09, 2017, 02:07:01 am
Hmm, well WordsAPI just seems to be WordNet. It is useful if you can't host I suppose.

I'm trying to expand the dictionary the bot uses so I had attempted to parse all the WordNet database files myself, but it was far easier to just use this :

http://wnsql.sourceforge.net/

I probably saved a week by using that.

Now I see all the other gubbins in there I think it will do just fine.