StringTools - Maple Programming Help

Home : Support : Online Help : Programming : Names and Strings : StringTools Package : Pattern Matching : StringTools/Search

StringTools

 Search
 search for an occurrence of a string in another string
 SearchAll
 search for all occurrences of a string in another string

 Calling Sequence Search( pattern, text ) Search( patlist, text ) Search( pattern, textlist ) SearchAll( pattern, text ) SearchAll( patlist, text ) SearchAll( pattern, textlist )

Parameters

 pattern - string text - string patlist - list of strings textlist - list of strings

Description

 • The Search(pattern, text) function searches for the string pattern in the string text. If pattern does not occur as a substring of text, then $0$ is returned. Otherwise, the index of the first character of the first occurrence of pattern in text is returned.
 • Either the first or second argument of Search, but not both, can be a list of strings.
 • If the first argument to Search is a list of strings, then Search searches for occurrences of any of the patterns in patlist in the specified string text. It returns a pair consisting of the offset of the first occurrence of any of the patterns found in text, and the index of the pattern that matches.
 • The cost of preprocessing a single pattern can be amortized by passing a list of strings as the second argument. This is functionally equivalent to computing map2( Search, Pattern, Texts ), where Pattern is the pattern string to search for, and Texts is a list of strings.
 • The SearchAll(pattern, text) function finds all occurrences of the string pattern in text. It returns an expression sequence of the indices of the first characters of occurrences of pattern in text. This expression sequence is NULL if pattern does not occur in text.
 • The procedure SearchAll also accepts a list of strings for either its first or its second argument, but not both.
 • When presented with a list of strings patlist as the first argument, SearchAll searches for all occurrences of the strings in patlist in the string text. The result of such a search is an expression sequence of pairs (lists of length equal to two) of the form $\left[\mathrm{offset},\mathrm{id}\right]$, where offset is the offset into the string text where a match occurred, and id is the index into patlist of the matching string.
 Note: You can specify a set of strings instead of a list for either patlist or textlist. However, because matches are identified by their position, it is recommended that you use a list, which has static element positions, rather than a set for patlist or textlist.
 • Passing a list textlist of strings as the second argument to SearchAll is more efficient than, but otherwise equivalent to, computing the expression $\mathrm{map2}\left(\left[\mathrm{SearchAll}\right],\mathrm{pattern},\mathrm{textlist}\right)$.
 • The procedure Search is similar to the built-in procedure SearchText.

Examples

 > $\mathrm{with}\left(\mathrm{StringTools}\right):$
 > $\mathrm{Search}\left("uv","auvb"\right)$
 ${2}$ (1)
 > $\mathrm{Search}\left("uv","abc"\right)$
 ${0}$ (2)
 > $\mathrm{SearchAll}\left("aba","abababababababababab"\right)$
 ${1}{,}{3}{,}{5}{,}{7}{,}{9}{,}{11}{,}{13}{,}{15}{,}{17}$ (3)
 > $\mathrm{SearchAll}\left("aba","uvw"\right)$
 > $\mathrm{Search}\left(\left["ab","bac"\right],"abcde"\right)$
 $\left[{1}{,}{1}\right]$ (4)
 > $\mathrm{Search}\left(\left["ab","bac"\right],"uvaw"\right)$
 $\left[{0}{,}{0}\right]$ (5)
 > $L≔\left["bac","ab"\right]:$
 > $\mathrm{Search}\left(L,"abcdababacef"\right)$
 $\left[{1}{,}{2}\right]$ (6)

This result indicates that a match was found at position 1, and that it was the second pattern (L) that matched at that position.

 > $\mathrm{SearchAll}\left(L,"abcdababacef"\right)$
 $\left[{1}{,}{2}\right]{,}\left[{5}{,}{2}\right]{,}\left[{7}{,}{2}\right]{,}\left[{8}{,}{1}\right]$ (7)

The result above indicates that there are matches at offsets 1, 5, 7, and 8 in the text, that the matching string is "ab" (L) in the first three matches, and the match at offset 8 is "bac" (L).

 > $\mathrm{SearchAll}\left(\left["ab","bac"\right],"uvbaw"\right)$

You can identify all substrings of a specified string that are in a dictionary as follows. (Many systems have such a dictionary in a file such as "/usr/share/dict/words". You can use any word list with one word per line. It does not need to be sorted.)

 > $\mathrm{ReadWordList}≔\mathrm{fname}→\mathrm{remove}\left(\mathrm{type},\mathrm{StringTools}:-\mathrm{Split}\left(\mathrm{readbytes}\left(\mathrm{fname},'\mathrm{TEXT}',\mathrm{∞}\right)\right),""\right):$
 > $\mathrm{dictionary}≔\mathrm{ReadWordList}\left(\mathrm{FileTools}:-\mathrm{JoinPath}\left(\left[\mathrm{kernelopts}\left(':-\mathrm{datadir}'\right),"help","StringTools","words.dat"\right]\right)\right):$
 > $\mathrm{SearchAll}\left(\mathrm{dictionary},"antidisestablishmentarianism"\right)$
 $\left[{1}{,}{11}\right]{,}\left[{2}{,}{15394}\right]{,}\left[{1}{,}{880}\right]{,}\left[{3}{,}{22301}\right]{,}\left[{1}{,}{1040}\right]{,}\left[{3}{,}{22900}\right]{,}\left[{4}{,}{11373}\right]{,}\left[{1}{,}{1069}\right]{,}\left[{5}{,}{5963}\right]{,}\left[{3}{,}{22911}\right]{,}\left[{6}{,}{11373}\right]{,}\left[{6}{,}{12357}\right]{,}\left[{7}{,}{19563}\right]{,}\left[{8}{,}{7328}\right]{,}\left[{9}{,}{19563}\right]{,}\left[{10}{,}{22301}\right]{,}\left[{11}{,}{11}\right]{,}\left[{9}{,}{21466}\right]{,}\left[{12}{,}{1778}\right]{,}\left[{10}{,}{22303}\right]{,}\left[{13}{,}{13037}\right]{,}\left[{14}{,}{11373}\right]{,}\left[{14}{,}{12357}\right]{,}\left[{15}{,}{19563}\right]{,}\left[{16}{,}{10296}\right]{,}\left[{8}{,}{7992}\right]{,}\left[{17}{,}{13933}\right]{,}\left[{18}{,}{7328}\right]{,}\left[{17}{,}{14481}\right]{,}\left[{18}{,}{7722}\right]{,}\left[{19}{,}{15394}\right]{,}\left[{17}{,}{14587}\right]{,}\left[{20}{,}{22301}\right]{,}\left[{21}{,}{11}\right]{,}\left[{22}{,}{18525}\right]{,}\left[{20}{,}{22421}\right]{,}\left[{23}{,}{11373}\right]{,}\left[{24}{,}{11}\right]{,}\left[{25}{,}{15394}\right]{,}\left[{24}{,}{880}\right]{,}\left[{26}{,}{11373}\right]{,}\left[{24}{,}{979}\right]{,}\left[{26}{,}{12357}\right]{,}\left[{27}{,}{19563}\right]{,}\left[{28}{,}{13933}\right]$ (8)
 > Subwords := proc( dict, s )   local    i;   use StringTools in     seq( dict[ i ], i = map2( op, 2, [ SearchAll( dict, s ) ] ) )   end use end proc:
 > $\mathrm{Subwords}\left(\mathrm{dictionary},"antidisestablishmentarianism"\right)$
 ${"a"}{,}{"n"}{,}{"an"}{,}{"t"}{,}{"ant"}{,}{"ti"}{,}{"i"}{,}{"anti"}{,}{"d"}{,}{"tid"}{,}{"i"}{,}{"is"}{,}{"s"}{,}{"e"}{,}{"s"}{,}{"t"}{,}{"a"}{,}{"stab"}{,}{"b"}{,}{"tab"}{,}{"l"}{,}{"i"}{,}{"is"}{,}{"s"}{,}{"h"}{,}{"establish"}{,}{"m"}{,}{"e"}{,}{"me"}{,}{"en"}{,}{"n"}{,}{"men"}{,}{"t"}{,}{"a"}{,}{"r"}{,}{"tar"}{,}{"i"}{,}{"a"}{,}{"n"}{,}{"an"}{,}{"i"}{,}{"ani"}{,}{"is"}{,}{"s"}{,}{"m"}$ (9)

Multiple searches using a single pattern can be done more efficiently by passing the strings to be searched in a list in a single call to Search or SearchAll.

 > $\mathrm{Pattern}≔\mathrm{Random}\left(1000,'\mathrm{lower}'\right):$
 > $\mathrm{Texts}≔\left[\mathrm{seq}\right]\left(\mathrm{Random}\left(5000,'\mathrm{lower}'\right),i=1..5000\right):$
 > $\mathrm{evalb}\left(\mathrm{Search}\left(\mathrm{Pattern},\mathrm{Texts}\right)=\mathrm{map2}\left(\mathrm{Search},\mathrm{Pattern},\mathrm{Texts}\right)\right)$
 ${\mathrm{true}}$ (10)
 > $\mathrm{time}\left(\mathrm{Search}\left(\mathrm{Pattern},\mathrm{Texts}\right)\right)$
 ${0.007}$ (11)
 > $\mathrm{time}\left(\mathrm{map2}\left(\mathrm{Search},\mathrm{Pattern},\mathrm{Texts}\right)\right)$
 ${0.016}$ (12)