Library for checking the grammar of text in natural languages

Description

The purpose of this project is to write a grammar checker for natural languages.

Notice that, while natural language understanding and translation are very hard problems, grammar checking is much easier. For instance, the fact that a sentence might be ambiguous is not a problem for a grammar checker. Also, it is not a catastrophe if a grammar error is not always detected, especially if it is one that is rare.

It might be a good idea to have the parser return the best parse tree, as defined by some kind of cost function. One might for instance associate a cost with each grammar production and compute the cost of a parse tree as the sum of the costs of productions used to obtain it. By doing it this way, we avoid creating an infinite number of parse trees when there are cycles in the productions. Most existing parsing technologies should be possible to adapt this way.

Another idea for this project is to let the user program the library with common mistakes. A speaker of French might for instance want to detect situations where the infinitive form of a verb has been used instead of the perfect participle. The library could contain a collection of modules for such common mistakes that the user can choose to include or exclude.


robert.strandh@gmail.com