What wrong with Wordnet, Celex2 and Dante for example?
Is there something missing from those that you would either choose to Add or do differently?
Just curious as I know some normally don't wish to reinvent the wheel.
Those are all great resources and I am already making heavy use of WordNet because I can download the entire database and reprocess it into any format that I require. Dante is only available through a web interface as far as I can see, but I would be able to use it to cross check information from other sources. At least they are no longer charging an exorbitant fee to access words that don't start with the letter R. I would gladly pay the fee and download Celex2 except that the license is very restrictive. It only allows pure research, no commercial applications, and nothing can be distributed outside my immediate research group. Not sure if the members of aidreams would qualify as my immediate research group.
The real problem is that every one of these resources is incomplete. I know of dozens and they all have useful bits of information that the others lack. That's why I've been working with resources that are open source and free to use, and which can be downloaded in their entirety and reprocessed into other formats. The lexical resources that I've already integrated into my database are drawn from Agid, Varcon, Moby, WordNet, VerbNet and some others that I'm not sure if I'm still allowed to use or not. I obtained them at a time when they were freely available on the web but they've since been withdrawn. I'll probably use them the same way that I would use Dante, for cross-checking. Others that I'm still working on assimilating are ConceptNet, Scowl, SUMO, ERG and WikiData.
I convert each resource into a relational database based on the original data format, all normalised to UTF-8. Once I'm satisfied that I've rendered the data faithfully I extract all the facts from the database in the form of tuples grounded in a common ontology. That way I can seamlessly merge all the tuples, and hence all the data, into a single body of knowledge, complete with provenance and confidence levels. Everything is completely scripted so whenever updates are released I can reprocess and regenerate everything in a matter of minutes.
So, as they say in the Celex2 documentation, such a database is still "the Holy Grail" but I am working towards it, and when I am done I will be able to release it for everyone to use. That's why it is number one on my wish list.