The IndependenceModel Package
Valentina Triolo
Università degli Studi di Torino
Dipartimento di Matematica Giuseppe Peano
Italy
valentina.triolo@gmail.com
Introduction
The main purpose of this work was to write a procedure to implement an algorithm based on the Diaconis Sturmfels algorithm to compute the Monte Carlo p-value of the model considered, but we present a package containing also some preliminary commands that can be useful to everyone studying an independence model.
The fact that independence models are algebraic toric models allows us to write general procedures to compute, for example, the parametric equations or the maximum likelihood estimation of the model, without knowing the set of random variables in advance.
The following commands are available in the package:
TableT LabelT NumberOfValues
ListOfValues TableLabel PossibleObs
Freq Parametric Equatn
ModelMatrix MaxLikeEst PearsonCoeff
DiaconisSturmfels
Data can be imported direcly from an Excel file, as explained in the TableT command instructions.
Description of commands
TableT
Calling Sequence
TableT("file.xls",sheet);
Parameters
file - the name of the Excel file which contains the data to be imported.
sheet - (optional) the name or index of the sheet containing the data (default: 1).
Description
The TableT command imports an Excel file using the ExcelTools package and converts it in the form needed for the use of other commands of the IndependenceModel package, i.e. a list of lists in which every list corresponds to one row of the Excel table.
The Excel table must be constructed in this form: every row corresponds to one variable and contains the name of such variable in the first column followed by all the observations. In this way, every column corresponds to one of the subjects considered in the analysis.
If data is stored with names of the variables on the first row and respective values on the column, use the TableT command followed by the command
ListTools[Transpose](T).
The Excel table cannot have empty cells, because Maple would read them as zeros. So, when the value of an observation is unknown, fill the cell with the value nc.
Example
with(IndependenceModel):
T:=TableT("exemfile.xls");
T := [["Var 1", 0., 1.0, 0., "nc", 0., 0., 1.0, 1.0, 1.0, 0., 0.,
0., 1.0, 1.0, 1.0, 0., 0., 0., 1.0, 1.0], ["Var 2", "x", "y",
"y", "x", "y", "y", "y", "x", "x", "x", "y", "y", "y", "y",
"x", "y", "x", "x", "x", "y"], ["Var 3", 0., 2.0, 2.0, 2.0,
0., 0., 2.0, 0., 2.0, 2.0, 2.0, 0., 0., 2.0, 2.0, 0., 0., 2.0,
0., "nc"], ["Var 4", 1.0, 1.0, 0., 1.0, 0., 0., 1.0, 0., 0.,
1.0, 1.0, 1.0, 1.0, 1.0, 0., 0., 1.0, 0., 1.0, 1.0], [
"Var 5", 0., 2.0, 1.0, 1.0, 1.0, 2.0, 2.0, 0., 2.0, 2.0, 0.,
0., 1.0, 1.0, 0., 0., 2.0, 0., 1.0, 2.0], ["Var 6", "A", "A",
"B", "A", "B", "A", "B", "A", "B", "B", "A", "B", "B", "B",
"A", "A", "A", "B", "B", "A"], ["Var 7", 0., 0., 0., 0., 1.0,
1.0, 0., 0., 1.0, 1.0, 1.0, 1.0, 0., 0., 0., 0., 1.0, 1.0,
1.0, 0.], ["Var 8", 1.0, 1.0, 0., 1.0, 1.0, 0., 0., 0., 1.0,
0., 0., 0., "nc", 1.0, 1.0, 1.0, 0., 0., 1.0, 0.], ["Var 9",
"B", "B", "A", "B", "A", "A", "A", "B", "A", "B", "B", "B",
"A", "A", "B", "A", "A", "B", "B", "B"], ["Var 10", 0., 2.0,
1.0, 1.0, 1.0, 2.0, 2.0, 0., 2.0, 2.0, 0., 0., 1.0, 1.0, 0.,
0., 2.0, 0., 1.0, 2.0]]
LabelT
LabelT(table);
table - the list of lists containing all the data.
The LabelT command builds a list with the names of the variables, reading them from the table considered. This list corresponds to the first column (or the first row) of the Excel table.
The parameter 'table' can be obtained from an Excel file using the TableT command. If the table is written directly in Maple by the user, please read the TableT instructions to write it in the correct form.
T:=TableT("exemfile.xls"):
["Var 1", "Var 2", "Var 3", "Var 4", "Var 5", "Var 6", "Var 7", "Var 8", "Var 9", "Var 10"]
NumberOfValues
NumberOfValues("var", table);
var - the name of the variable considered
The NumberOfValues command tells how many different values of the variable 'var' occur in the table considered.
NumberOfValues("Var 3",T);
2
ListOfValues
ListOfValues("var", table);
The ListOfValues command returns the list of the values that the variable 'var' can take, reading them from the table considered.
The parameter 'table' can be obtained from an Excel file using the TableT command. If the table is written directly in Maple by the user, please read the TableT instructions to write it in the correct form
ListOfValues("Var 3",T);
[0., 2.0]
TableLabel
TableLabel (table , list);
list - the list ["var 1", ... , "var n"] with the names of the variables involved in the independence model.
The TableLable command extracts from the table considered the rows corresponding the variables var1, ..., var n of the list. Thus, the output is still a list of lists but contains only n lists.
L:=["Var 2","Var 3","Var 6"]
TT:=TableLabel(T,L);
TT := [["Var 2", "x", "y", "y", "x", "y", "y", "y", "x", "x", "x",
"y", "y", "y", "y", "x", "y", "x", "x", "x", "y"],
["Var 3", 0., 2.0, 2.0, 2.0, 0., 0., 2.0, 0., 2.0, 2.0, 2.0, 0., 0.,
2.0, 2.0, 0., 0., 2.0, 0., "nc"],
["Var 6", "A", "A", "B", "A", "B", "A", "B", "A", "B", "B", "A", "B", "B", "B", "A",
"A", "A", "B", "B", "A"]]
PossibleObs
PossibleObs(table, list);
The PossibleObs command builds a list of lists containing the possible observation of the variables considered in the parameter 'list'.
For example, considering two different variables with values a or b for the first one and c or d for the second one, the possible observations are (a,c), (a,d), (b,c) and (b,d).
Freq
Freq(table, list);
The Freq command tells how many times each of the possible observations occurs in the table. The output is a 2-column matrix containing a possible observation in the first column and its frequency in the second one.
L:=["Var 2","Var 3","Var 6"]:
Freq(T,L);
[["x", 0., "A"] 3]
[["x", 0., "B"] 1]
[["x", 2.0, "A"] 2]
[["x", 2.0, "B"] 3]
[["y", 0., "A"] 2]
[["y", 0., "B"] 3]
[["y", 2.0, "A"] 2]
[["y", 2.0, "B"] 3]
Parametric
Parametric(table, list, sigma);
sigma - the list of lists [ ["var 1" , ... , "var h"] , ... , ["var k" , ... , "var n"] ] containing the groups of variables of which the user wants to study the independence.
The Parametric command gives the parametric equations of the indipendence model considerd.
The independence model is expressed by the parameters 'list' and 'sigma'.
For example, if we want to study the independence of two variables from a third one, the parameters would be:
list:=["var1" , "var2" , "var3"]; sigma:=[ ["var1", "var2"] , ["var3"] ];.
S:=[["Var 2","Var 3"], ["Var 6"] ];
S := [["Var 2", "Var 3"], ["Var 6"]]
> Parametric(T,L,S);
<p[0, 0, 0] - t[1, 2, 0, 0] t[3, 0],
p[0, 0, 1] - t[1, 2, 0, 0] t[3, 1],
p[0, 1, 0] - t[1, 2, 0, 1] t[3, 0],
p[0, 1, 1] - t[1, 2, 0, 1] t[3, 1],
p[1, 0, 0] - t[1, 2, 1, 0] t[3, 0],
p[1, 0, 1] - t[1, 2, 1, 0] t[3, 1],
p[1, 1, 0] - t[1, 2, 1, 1] t[3, 0],
p[1, 1, 1] - t[1, 2, 1, 1] t[3, 1]>
Equatn
Equatn(table, list, sigma);
sigma- the list of lists [ ["var 1" , ... , "var h"] , ... , ["var k" , ... , "var n"] ] containing the groups of variables of which the user wants to study the independence.
The Equatn command gives the equation of the independence model, using the EliminationIdeal command on the parametric equation of the model built with the Parametric command.
The parameter 'table' can be obtained from an Excel file using the command 'TableT'. If the table is written directly in Maple by the user, please read the TableT instructions to write it in the correct form.
S:=[["Var 2","Var 3"], ["Var 6"] ]:
Equatn(T,L,S);
<p[0,1,0]*p[0,0,1]-p[0,1,1]*p[0,0,0], p[0,1,1]*p[1,0,0]-p[1,0,1]*p[0,1,0], p[1,0,0]*p[0,0,1]-p[1,0,1]*p[0,0,0], p[1,1,0]*p[0,0,1]-p[1,1,1]*p[0,0,0], p[1,1,0]*p[0,1,1]-p[1,1,1]*p[0,1,0], p[1,1,0]*p[1,0,1]-p[1,1,1]*p[1,0,0]>
ModelMatrix
ModelMatrix(table, list, sigma);
sigma- the list of lists [ ["var 1" , ... , "var h"] , ... , ["var k" , ... , "var n"] ] containing the groups of variables of which the user wants to study the indipendence.
The ModelMatrix command builds the matrix of the independence model considered.
The independence model is expressed by the parameters list and sigma.
The parameter 'table' can be obtained from an Excel file using the command 'TableT'. If the table is written directly in Maple from the user, please read the TableT instructions to write it in the correct form.
ModelMatrix(T,L,S);
[1 1 0 0 0 0 0 0]
[0 0 1 1 0 0 0 0]
[0 0 0 0 1 1 0 0]
[0 0 0 0 0 0 1 1]
[1 0 1 0 1 0 1 0]
[0 1 0 1 0 1 0 1]
MaxLikeEst
MaxLikeEst(table,list,sigma);
sigma-the list of lists [ ["var 1" , ... , "var h"] , ... , ["var k" , ... , "var n"] ] containing the groups of variables of which the user wants to study the independence.
The MaxLikeEst command computes the Maximum Likelihood Estimation of the indipendence model considered.
MaxLikeEst(T,L,S);
{p[0,0,0] = 36/361, p[0,0,1] = 40/361, p[0,1,0] = 45/361, p[0,1,1] = 50/361, p[1,0,0] = 45/361, p[1,0,1] = 50/361, p[1,1,0] = 45/361, p[1,1,1] = 50/361}
PearsonCoeff
PearsonCoeff(p,K);
p - maximum likelihood extimation, expressed as a list
K - the list of frequencies
The PearsonCoeff command computes the Pearson coefficient with vector of frequencies K and MLE p.
p:=[36/361, 40/361, 45/361, 50/361, 45/361, 50/361, 45/361, 50/361}]:
k:=[3,1,2,3,2,3,2,3]:
PearsonCoeff(p,k);
931/600
DiaconisSturmfels(table,list,sigma, n);
n - the number of iterations
This command applies the Diaconis-Sturmels algorithm on the independence model considered to compute an estimation of the p-value, called the Monte Carlo
p-value.
The indipendence model is expressed by the parameters 'list' and 'sigma'.
The parameter 'table' can be obtained from an Excel file using the command 'TableT'. If the table is written directly in Maple from the user, please read the TableT instructions to write it in the correct form
DiaconisSturmfels(T,L,S,10000);
0.8096000000
References
-P. Diaconis, B.Sturmfels
Algebraic algorithms for sampling from conditional distributions
The Annals of Statistics, Vol. 26, No. 1, 1998, p. 363-397
-L. Patcher, B. Sturmfels
Algebraic Statistics for computational biology
Cambridge University Press, 2005
-F. Ricceri, C. Fassino, G. Matullo, M. Roggero, M.L. Torrente, P. Vineis, L. Terracini
Algebraic methods for studying interaction between epidemiological variables
Math. Model. Nat. Phenom., Vol. 7, No. 3, 2012, p. 227-252
-B. Sturmfels
Algebra and geometry of statistical models
John von Neumann Lectures, Technical University, Munchen, 2003
Disclaimer
Legal Notice: © Maplesoft, a division of Waterloo Maple Inc. 2009. Maplesoft and Maple are trademarks of Waterloo Maple Inc. Neither Maplesoft nor the authors are responsible for any errors contained within and are not liable for any damages resulting from the use of this material. This application is intended for non-commercial, non-profit use only. Contact the authors for permission if you wish to use this application in for-profit activities.