Unicode Text Segmentation

Type of document:
Standard
Organization:
Unicode Consortium
Year:
2013
Description:
This standard (Unicode Standard Annex #29) describes guidelines for determining default boundaries between certain significant text elements: grapheme clusters (user-perceived characters), words, and sentences.
Workflow stage:
1.9, 2.3
Business value:
This standard is relevant for natural language processing applications that have to implement a standard method of word and character segmentation.
Web site:
http://unicode.org/reports/tr29/

Send feedback or questions to: info@termologic.com