approximate segmentation of a string of English text into sentences
Sentences( s )
The Sentences(s) command attempts to split a string, presumed to be composed of English language text, into its constituent sentences. It does this by recognizing sentence boundaries. The beginning and the end of the input string are regarded as sentence boundaries in all cases. Internal sentence boundaries are recognized by the presence of a sentence terminator, which is one of the following:
A small number of built-in patterns are used to recognize some exceptions.
Note that you can also use the RegSplit command with the fixed string "\n\n" as the splitting pattern to segment English text into paragraphs.
All of the StringTools package commands treat strings as (null-terminated) sequences of 8-bit (ASCII) characters. Thus, there is no support for multibyte character encodings, such as unicode encodings.
Sentences⁡This is a
sentence. Can we have another? Yes, here's one more.
This is a
sentence.,Can we have another?,Yes, here's one more.
Download Help Document