a tagging program whose labels indicate a word's part of speech


-- "another words," instead of "in other words," at the beginning of a sentence --> <pattern> <token postag="SENT_START"></token> <token>another</token> <token>words</token> <token>, </token> </pattern> <message>Did you mean 'in other words'?</message> Part-of-speech tagger determines every word's part of speech, helping the user to find tokens that belong to a certain class:
<!-- Example 2: "a/an" should not be used with plural nouns --> <token>a | an</token> <token pos="NNS|NNPS" parent="1"></token> This mal-rule finds a/an articles, linked to plural nouns (marked as NNS or NNPS by a part-of-speech tagger).
If their expressive power is insufficient to describe a certain rule, one can make use of additional natural language processing-powered syntactic elements, backed with sentence splitter and part-of-speech tagger.
However, our experiments have revealed weak points of the language tools we use (parser and part-of-speech tagger, mainly).
For example, a part-of-speech tagger cannot reliably determine a tag for the word "like" in the phrase "he like dogs", since such a pattern never appears in the training collection.
The methodology used results to a language model, with syntactic relations of the words, senses, synonyms and hyponyms from the WordNet and part-of-speech taggers.
MEDLINE data format parser with a complete representation of citations as Java objects, MEDLINE-tuned sentence detection, named entity detectors trained on GENIA and BioCreative data, and biomedical text part-of-speech taggers trained on GENIA and MedPost data.
It is possible to train very accurate part-of-speech taggers from manually tagged data [2].