SimilarityScore - Maple Help

Home : Support : Online Help : Education : EssayTools : SimilarityScore

EssayTools

 SimilarityScore
 compares essays and rates their similarity

 Calling Sequence SimilarityScore( essay1, essay2 ) SimilarityScore( essays )

Parameters

 essay1, essay2, essays - strings, or lists of strings: the essays to be compared binarycount - (optional) truefalse methods - (optional) list of procedures tobaseline - (optional) truefalse filter - (optional) procedure or literal name lemma, stem, or none symmetric - (optional) truefalse

Description

 • The SimilarityScore command compares the word use in two or more essays, and returns a matrix of scores such that location [i,j] rates the similarity of essays i and essays j. A score of zero indicates there is no overlap between the two essays. A score of 1 indicates that there is complete overlap between the two essays (which does not necessarily mean the essays are identical).
 • If essay1 and essay2 are lists or arrays of essays, each essay in the first list will be compared to each essay in the second list.  If essay2 is not given, every essay in the essay1 list will be compared to each other.
 • There are many different methods available for comparing the word use similarity score.  The methods option can be used to select a custom procedure, or one or more of the builtin procedures: CosineCoefficient, JaccardCoefficient, DiceCoefficient, or their binary counterparts.
 • The binarycount option, when set to true, will force the word count vectors used to record which words are present in the compared essays to contain only 1's and 0's to indicate the presence or absence of words rather than a count of the number of occurrences.
 • The filter option specifies the definition of a "word".  The default is filter=lemma which reduces the set of words based on their meaning. Other options are filter=stem which applies a stemming algorithm, and filter=none.  Custom filters can also be applied, such as filter=StringTools:-LowerCase.
 • When comparing multiple essays the tobaseline option controls the list of words used in comparison.  When tobaseline = true, the pool of words consists of all words from all essays.  When tobaseline = false the pool of words consists of just the words found in the two essays being compared at each step.
 • The symmetric option can be set to true as an optimization if you know that f(essay[i],essay[j])  = f(essay[j],essay[i]) in all cases.  Specifying this option will compute only the results where $i\le j$, and store the results twice.
 • This function is part of the EssayTools package, so it can be used in the short form SimilarityScore(..) only after executing the command with(EssayTools). However, it can always be accessed through the long form of the command by using EssayTools[SimilarityScore](..).

Examples

 > $\mathrm{Hemingway}≔"Nothing happened. The fish just moved away slowly and the old man could not raise him an inch. His line was strong and made for heavy fish and he held it against his hack until it was so taut that beads of water were jumping from it. Then it began to make a slow hissing sound in the water and he still held it, bracing himself against the thwart and leaning back against the pull. The boat began to move slowly off toward the north-west.":$
 > $\mathrm{Melville}≔"I see in him outrageous strength, with an inscrutable malice sinewing it. That inscrutable thing is chiefly what I hate; and be the white whale agent, or be the white whale principal, I will wreak that hate upon him.":$
 > $\mathrm{Anderson}≔"They were six beautiful children; but the youngest was the prettiest of them all; her skin was as clear and delicate as a rose-leaf, and her eyes as blue as the deepest sea; but, like all the others, she had no feet, and her body ended in a fish\text{'}s tail.":$
 > $\mathrm{with}\left(\mathrm{EssayTools}\right):$
 > $\mathrm{SimilarityScore}\left(\mathrm{Hemingway},\mathrm{Melville}\right)$
 $\left[\begin{array}{c}{0.1025641026}\end{array}\right]$ (1)
 > $\mathrm{SimilarityScore}\left(\left[\mathrm{Hemingway},\mathrm{Melville},\mathrm{Anderson}\right]\right)$
 $\left[\begin{array}{ccc}{1.}& {0.1025641026}& {0.08139534884}\\ {0.1025641026}& {1.}& {0.08333333333}\\ {0.08139534884}& {0.08333333333}& {1.}\end{array}\right]$ (2)
 > $\mathrm{SimilarityScore}\left(\left[\mathrm{Hemingway},\mathrm{Melville},\mathrm{Anderson}\right],\mathrm{methods}=\left[\mathrm{EssayTools}:-\mathrm{CosineCoefficient}\right]\right)$
 $\left[\begin{array}{ccc}{1.000000000}& {0.3583566001}& {0.3811595642}\\ {0.3583566001}& {1.000000000}& {0.2912325843}\\ {0.3811595642}& {0.2912325843}& {1.000000000}\end{array}\right]$ (3)
 > $\mathrm{SimilarityScore}\left(\left["a b c"\right],\left["a b c","a a a","b c d","x y z"\right]\right)$
 $\left[\begin{array}{cccc}{1.}& {0.3333333333}& {0.5000000000}& {0.}\end{array}\right]$ (4)

Compatibility

 • The EssayTools[SimilarityScore] command was introduced in Maple 17.