Looking for simple word type lists

  • 19 Replies
  • 1496 Views
*

Freddy

  • Administrator
  • **********************
  • Colossus
  • *
  • 6114
  • Mostly Harmless
Looking for simple word type lists
« on: January 03, 2017, 04:51:53 am »
I'm working on something that takes a sentence and figures out what each type of word is, ie; verb, noun, preposition etc.

I managed to find a few sources online, got about 5000 words to play with at the moment. All I need is a straight list, like a list of verbs or nouns. I store them separately at the moment.

I was using Wordnik API to find out what type of words they were, but it's a bit slow doing it that way. If anyone knows of anything like I describe, could they share please ?

Thanks. :)

*

infurl

  • Trusty Member
  • *******
  • Starship Trooper
  • *
  • 273
  • Humans will disappoint you.
    • Home Page
Re: Looking for simple word type lists
« Reply #1 on: January 03, 2017, 08:11:29 am »
There are two resources online that have what you are looking for.

MOBY is in a fairly simple format and would be the easiest one for you to use. The part-of-speech file has 230,000 words in it. The files use some strange character encodings but if you don't care about foreign words and diacritical marks you can treat them as ASCII.

http://icon.shef.ac.uk/Moby/

Another good (better) resource is WordNet. The data files are in a much more complicated format but there are plenty of utilities for using them.

https://wordnet.princeton.edu/

I've performed in depth analyses on both these resources (among many others) and converted them into relational database formats so if you tell me what would be the most convenient format for you I could generate it in a jiffy.

Of course I hope you realise that trying to tag words in sentences from lists is futile because the same word can be a different part of speech depending on where it is in the sentence. This is my favorite topic and I could go on about it for hours, but I'll stop right there.

*

infurl

  • Trusty Member
  • *******
  • Starship Trooper
  • *
  • 273
  • Humans will disappoint you.
    • Home Page
Re: Looking for simple word type lists
« Reply #2 on: January 03, 2017, 08:16:49 am »
Here's a summary of the current contents of my lexical database.
 
Code: [Select]
Adjective     |  70409
Adverb        |  15745
Conjunction   |    138
Determinative |    103
Interjection  |    641
Noun          | 288753
Preposition   |    268
Verb          |  57245

*

korrelan

  • Trusty Member
  • ********
  • Replicant
  • *
  • 693
  • Look into my eyes! WOAH!
    • Google +
Re: Looking for simple word type lists
« Reply #3 on: January 03, 2017, 08:52:30 am »
Woah… That’s an excellent resource.

I've parsed the Wordnet ANSI version into my own format and I’m currently linking/ cross referencing with a large phoneme database (Sphinx). The word descriptions will come in very handy too.

Weird… that ‘Battle of Britain’ is listed as a noun though…

I was considering writing another simple Chatbot engine as a side project… this will come in very handy… Cheers.

 :)
It thunk... therefore it is!

*

Freddy

  • Administrator
  • **********************
  • Colossus
  • *
  • 6114
  • Mostly Harmless
Re: Looking for simple word type lists
« Reply #4 on: January 03, 2017, 06:01:41 pm »
Thanks Infurl, I'll take a look at those.

Yes I did realise it was somewhat futile, but I'm just experimenting at the moment, what I am doing is pretty simple. I've actually learned more about language in the past 24 hours than I think I ever did when at school.

Will get back to you if I need anything :)

Quote
Weird… that ‘Battle of Britain’ is listed as a noun though…

Yes this was the trouble I ran into with some other online resources. The way I have been doing it is not to look at the phrase in whole, but rather as individual words. So that would be :

noun + preposition + noun

I think this is why a lot of those lists I found are so long - because they include things like that. If I look at the words separately it is enough for me to decide what to do with them and probably quicker.

*

Art

  • At the end of the game, the King and Pawn go into the same box.
  • Global Moderator
  • ******************
  • Hal 4000
  • *
  • 4445
Re: Looking for simple word type lists
« Reply #5 on: January 03, 2017, 07:57:23 pm »
http://www.sequencepublishing.com/1/thesage.html

Freddy,

Give it a try. I keep mine on the taskbar for those "how was that again?" moments. ;)

Free or $10 for Pro version.
In the world of AI, it's the thought that counts!

*

Freddy

  • Administrator
  • **********************
  • Colossus
  • *
  • 6114
  • Mostly Harmless
Re: Looking for simple word type lists
« Reply #6 on: January 04, 2017, 05:20:53 am »
Thanks for the tip Art, but I was after something for programming purposes rather than an accessory.

I used Wordnet in the end and built a parser in PHP so I can load their files into a MYSQL database. Playing with strings and parsing are some of my favourite things.

*

Freddy

  • Administrator
  • **********************
  • Colossus
  • *
  • 6114
  • Mostly Harmless
Re: Looking for simple word type lists
« Reply #7 on: January 04, 2017, 11:28:32 pm »
I got it all into MYSQL, for anyone interested in how Wordnet breaks down, this is what I pulled from their database files.

Adjectives 16340
Adverbs 572
Nouns 82190
Prepositions 148   
Verbs 13789

The prepositions are my addition from another source.

Over 100,000 words should be enough for me to play with.

The PHP parser I built to extract it all, processes everything and inserts the data into the database in under 10 seconds  8)

*

infurl

  • Trusty Member
  • *******
  • Starship Trooper
  • *
  • 273
  • Humans will disappoint you.
    • Home Page
Re: Looking for simple word type lists
« Reply #8 on: January 04, 2017, 11:54:08 pm »
That seems a little slow but... PHP

Presumably you are preserving all the hierarchical relationships between the different senses and sets of synonyms and included the glossaries and verb frames as well. The actual word lists don't include any inflections either, but no doubt you found a clever way to generate all your comparative, and superlative adjectives, plural nouns, and gerund participles, past participles, preterites and third person singular verbs using other means.  O0

*

Freddy

  • Administrator
  • **********************
  • Colossus
  • *
  • 6114
  • Mostly Harmless
Re: Looking for simple word type lists
« Reply #9 on: January 05, 2017, 12:01:26 am »
PHP was just the path of least resistance as I've done a lot of coding in it. It also has a lot of useful string handling functions.

I didn't preserve the relationships at the moment, for now my needs are simple. I did preserve synonyms though. I had already written some routines to make singulars and plurals. So I can use them with this.

*

infurl

  • Trusty Member
  • *******
  • Starship Trooper
  • *
  • 273
  • Humans will disappoint you.
    • Home Page
Re: Looking for simple word type lists
« Reply #10 on: January 05, 2017, 12:04:13 am »
You might find some of these word lists useful too.

http://wordlist.aspell.net/other/

You won't get far without inflections for verbs and adjectives.

*

Don Patrick

  • Trusty Member
  • *******
  • Starship Trooper
  • *
  • 388
    • Artificial Detective
Re: Looking for simple word type lists
« Reply #11 on: January 05, 2017, 09:50:11 am »
Thanks for the Moby list, Infurl. I can use the list of intransitive verbs for my output (asking "What" questions with verbs that don't take an object gets awkward).

I think most word lists are bloated with verb tenses and compound words. I do find part of speech categories somewhat useful in combination with syntactical restrictions. A lot of words like "program" can be a noun or verb, but only one of those when it's preceded by "the". Of course programming all those restrictions is a downright mess and probably better delegated to already existing parsers.
Personal project: NLP -> learning -> knowledge -> logical inference -> A.I.

*

infurl

  • Trusty Member
  • *******
  • Starship Trooper
  • *
  • 273
  • Humans will disappoint you.
    • Home Page
Re: Looking for simple word type lists
« Reply #12 on: January 05, 2017, 07:58:26 pm »
That's great @Don. I hope those resources will help everybody.

Unfortunately the distinction between transitive and intransitive verbs isn't really sufficient as there are five types altogether (intransitive, complex intransitive, monotransitive, complex transitive and ditransitive). The complex variants allow for optional adjuncts (modifiers e.g. "on Wednesday") as distinct from the mandatory complements (subject, object and indirect object). Luckily VerbNet has enough really detailed information to fill in all the blanks but it is very messy.

I've converted the entire VerbNet XML database into a very convenient relational database and with a bit more effort will have rendered the whole thing into a very nice grammar definition. With the right "grammar language" (one which supports feature constraints) matching up verbs and prepositions isn't at all messy but it's a necessary step towards figuring out which syntactic items (subject, object, indirect object) become which thematic roles (agent, patient, instrument etc) which in turn is a requirement for semantic (deep) parsing.
« Last Edit: January 05, 2017, 09:14:15 pm by infurl »

*

Don Patrick

  • Trusty Member
  • *******
  • Starship Trooper
  • *
  • 388
    • Artificial Detective
Re: Looking for simple word type lists
« Reply #13 on: January 06, 2017, 10:20:00 am »
My syntactical restraints are lists of if-then rules instead of a neat grammar language because there seemed to be too many exceptions for a consistent template (Grammar may be consistent but people aren't). I also designed them for learning new words on the fly instead of relying on a database that already has all the answers. It isn't hard to machine-learn verbs that have direct objects in texts, but reversely, the absence of direct objects doesn't automatically mean the verb is intransitive, so I can better use a list for those cases.

Ideally I would use VerbNet if I could make heads and tails of it. In VerbNet's format, how can I tell whether ARG1(?) indicates a direct object? There seem to be more roles than "Patient". For now I only need to distinguish verbs that don't fit the question "What/who do you verb?".
Quote
nw/wsj/01/wsj_0105.parse 18 37 gold rob-v 10.6 Robbery rob.01 2 ----- 35:1-ARG1=Source;Victim 37:0-rel

nw/wsj/01/wsj_0105.parse 18 39 gold murder-v 42.1 Killing murder.01 1 ----- 35:1*40:1-ARG1=Patient;Victim 39:0-rel
Personal project: NLP -> learning -> knowledge -> logical inference -> A.I.

*

infurl

  • Trusty Member
  • *******
  • Starship Trooper
  • *
  • 273
  • Humans will disappoint you.
    • Home Page
Re: Looking for simple word type lists
« Reply #14 on: January 07, 2017, 09:17:46 am »
VerbNet is very complicated and it took quite a bit of effort to unravel it, but I think it was worth it. They have organised it to be as concise as possible, but it makes it a lot more difficult to decode. Verb Classes can have subclasses which add more members, roles and frames to them. Roles in subclasses can override roles in base classes. Roles can be restricted by selection criteria and syntax elements can also have restrictions placed on them. Converting it all into a relational database made it all a lot easier to understand and use and while I was at it, I converted all the logical restrictions to conjunctive normal form which means they can be used directly in grammar rules.

Once you put it all back together you get about 2500 different frames like the following examples, from which it is comparatively easy to pinpoint the sense of the verb, and which noun phrases become which thematic role. Yes, there are a lot of different thematic roles and they are also organised in an inheritance hierarchy. The excitement never ends.

Code: [Select]
-[ RECORD 1 ]--------------------------------------------
example | Amanda shoved the box.
item1   | {NP,Agent,+int_control}
item2   | {VERB}
item3   | {NP,Theme,+concrete}

-[ RECORD 2 ]--------------------------------------------
example | Amanda shoved the box from the corner.
item1   | {NP,Agent,+int_control}
item2   | {VERB}
item3   | {NP,Theme,+concrete}
item4   | {PREP,+src}
item5   | {NP,Initial_Location,+location}

-[ RECORD 3 ]--------------------------------------------
example | Amanda shoved the box to John.
item1   | {NP,Agent,+int_control}
item2   | {VERB}
item3   | {NP,Theme,+concrete}
item4   | {PREP,"to towards"}
item5   | {NP,Destination}

-[ RECORD 4 ]--------------------------------------------
example | Amanda shoved the box from the corner to John.
item1   | {NP,Agent,+int_control}
item2   | {VERB}
item3   | {NP,Theme,+concrete}
item4   | {PREP,+src}
item5   | {NP,Initial_Location,+location}
item6   | {PREP,"to towards"}
item7   | {NP,Destination}

-[ RECORD 5 ]--------------------------------------------
example | Amanda shoved the box to John from the corner.
item1   | {NP,Agent,+int_control}
item2   | {VERB}
item3   | {NP,Theme,+concrete}
item4   | {PREP,"to towards"}
item5   | {NP,Destination}
item6   | {PREP,+src}
item7   | {NP,Initial_Location,+location}

 


Dreaming
by Freddy (New Users Please Post Here)
September 21, 2017, 11:24:24 pm
AI safety
by korrelan (General AI Discussion)
September 21, 2017, 11:16:03 pm
Hello
by Freddy (New Users Please Post Here)
September 21, 2017, 10:46:16 pm
Grats to SquareBear
by korrelan (General Chatbots and Software)
September 21, 2017, 10:44:42 pm
Map of Computer Science
by keghn (General AI Discussion)
September 21, 2017, 07:25:21 pm
XKCD Comic : USB Cables
by Tyler (XKCD Comic)
September 21, 2017, 12:01:33 pm
outline from gadient mask
by yotamarker (General AI Discussion)
September 21, 2017, 11:32:35 am
the emergence of AI
by Memnon (Future of AI)
September 21, 2017, 10:37:19 am

Users Online

21 Guests, 2 Users
Users active in past 15 minutes:
keghn, Freddy
[Administrator]
[Trusty Member]

Most Online Today: 23. Most Online Ever: 208 (August 27, 2008, 09:36:30 am)

Articles