Title: | Estimating Finite State Machine Models from Data |
---|---|
Description: | Automatic generation of finite state machine models of dynamic decision-making that both have strong predictive power and are interpretable in human terms. We use an efficient model representation and a genetic algorithm-based estimation process to generate simple deterministic approximations that explain most of the structure of complex stochastic processes. We have applied the software to empirical data, and demonstrated it's ability to recover known data-generating processes by simulating data with agent-based models and correctly deriving the underlying decision models for multiple agent models and degrees of stochasticity. |
Authors: | John J. Nay [aut], Jonathan M. Gilligan [cre, aut] |
Maintainer: | Jonathan M. Gilligan <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.2.4 |
Built: | 2024-11-14 04:15:28 UTC |
Source: | https://github.com/jonathan-g/datafsm |
Extracts slot of action_vec
action_vec(x)
action_vec(x)
x |
S4 ga_fsm object |
add_interact_num
takes in data and returns a vector of interactions
add_interact_num(d)
add_interact_num(d)
d |
data.frame of panel data |
Returns a vector specifying interactions
Extracts performance
best_performance(x)
best_performance(x)
x |
S4 ga_fsm object |
build_bitstring
creates a bitstring from an action vector, state
matrix, and number of actions.
build_bitstring(action_vec, state_mat, actions)
build_bitstring(action_vec, state_mat, actions)
action_vec |
Numeric vector indicating what action to take for each state. |
state_mat |
Numeric matrix with rows as states and columns as predictors. |
actions |
Numeric vector length one with the number of actions. |
Returns numeric vector bitstring.
compare_fsm
uses a specified distance measure to compare FSMs.
compare_fsm(users, gas, comparison = "manhattan")
compare_fsm(users, gas, comparison = "manhattan")
users |
Numeric vector or numeric matrix with a predefined FSM |
gas |
Numeric vector or numeric matrix with an evolved FSM |
comparison |
Character string of length one with either "manhattan", "euclidean", or "binary". |
Compares a user-defined FSM to a decoded estimated FSM. If you have have FSMs that may have values in the matrices that are not all simple integers, you can use the distance metric that is most appropriate. Euclidean does sqrt(sum((x_i - y_i)^2)) - the L2 norm. Manhattan takes abs diff between them - the L1 norm. Binary treats non-zero elements as "on" and zero elements as "off" and distance is the proportion of bits in which only one is on amongst those in which at least one is on.
Numeric vector of length one for the distance between the two supplied FSMs, calculated according to the comparison argument.
It relies on the GA package: Luca Scrucca (2013). GA: A Package for Genetic Algorithms in R. Journal of Statistical Software, 53 (4), 1-37. URL https://www.jstatsoft.org/v53/i04/.
datafsm
's main function for estimating a fsm decision
model:
datafsm
's helper functions:
Maintainer: Jonathan M. Gilligan [email protected] (ORCID)
Authors:
John J. Nay [email protected]
Useful links:
Report bugs at https://github.com/jonathan-g/datafsm/issues
decode_action_vec
decodes action vector.
decode_action_vec(string, states, inputs, actions)
decode_action_vec(string, states, inputs, actions)
string |
Numeric (integer) vector of only 1's and 0's. |
states |
Numeric vector with the number of states, which is the number of rows. |
inputs |
Numeric vector length one, with the number of columns. |
actions |
Numeric vector with the number of actions. Actions (and states) determine how many binary elements we need to represent an element of the action (or state) matrix. |
This function takes a solution string of binary values in Gray representation, transforms it to a decimal representation, then puts it in matrix form with the correct sized matrices, given the specified numbers of states, inputs, and actions.
Returns numeric (integer) vector.
decode_state_mat
decodes state matrix.
decode_state_mat(string, states, inputs, actions)
decode_state_mat(string, states, inputs, actions)
string |
Numeric vector. |
states |
Numeric vector with the number of states, which is the number of rows. |
inputs |
Numeric vector length one, with the number of columns. |
actions |
Numeric vector with the number of actions. Actions (and states) determine how many binary elements we need to represent an element of the action (or state) matrix. |
This function takes a solution string of binary values in Gray representation, transforms it to a decimal representation, then puts it in matrix form with the correct sized matrices, given the specified numbers of states, inputs, and actions.
Returns numeric (integer) matrix.
degeneracy_check
finds indices for non-identifiable elements of state
matrix and then flips values for those elements and checks changes in
resulting fitness.
degeneracy_check(state_mat, action_vec, cols, data, outcome)
degeneracy_check(state_mat, action_vec, cols, data, outcome)
state_mat |
Numeric matrix with rows as states and columns as predictors. |
action_vec |
Numeric vector indicating what action to take for each state. |
cols |
Optional numeric vector same length as number of columns of the
state matrix ( |
data |
Numeric matrix that has first col period and rest of cols are predictors. |
outcome |
Numeric vector same length as the number of rows as data. |
degeneracy_check
finds indices for non-identifiable elements of state
matrix and then flips values for those elements and checks changes in
resulting fitness. Being in state/row k (e.g. 2) corresponds to taking action
j (e.g. D). For row k, all entries in the matrix that corresponds to taking
action j last period (e.g. columns 2 and 4 for D) are identifiable; however,
columns that correspond to not taking action j last period (e.g. columns 1
and 3 for D) for the row $k$ that corresponds to taking action j are not
identifiable for a deterministic play of the strategy. For all elements of
the matrix that are not identifiable, the value of the element can be any
integer in the inclusive range of the number of rows of the matrix (e.g. 1 or
2). With empirical data, where the probability that a single deterministic
model generated the data is effectively zero, it is useful to find every
entry in the matrix that would be unidentifiable if the strategy were played
deterministically and then for each element flip it to its opposite value and
test for any change in fitness of the strategy on the data. This function
implements this idea. If there is no change, a sparse matrix is returned
where the the elements in that matrix with a 0 are unidentifiable because
their value makes no difference to the fit of the strategy to the provided
data. If, for each element in the matrix, switching its value led to a
decrease in fitness the following message is displayed, “Your strategy is a
deterministic approximation of a stochastic process and all of the elements
of the state matrix can be identified.” If the model is fine, then
sparse_state_mat
and corrected_state_mat
should be equal to
state_mat
.
Returns a list of with sparse and corrected state matrix. If the
model is fine, thensparse_state_mat
and corrected_state_mat
should be equal to state_mat
.
Extracts slot relevant to estimating the fsm
estimation_details(x)
estimation_details(x)
x |
S4 ga_fsm object |
evolve_model
uses a genetic algorithm to estimate a finite-state
machine model, primarily for understanding and predicting decision-making.
evolve_model(data, test_data = NULL, drop_nzv = FALSE, measure = c("accuracy", "sens", "spec", "ppv"), states = NULL, cv = FALSE, max_states = NULL, k = 2, actions = NULL, seed = NULL, popSize = 75, pcrossover = 0.8, pmutation = 0.1, maxiter = 50, run = 25, parallel = FALSE, priors = NULL, verbose = TRUE, return_best = TRUE, ntimes = 1)
evolve_model(data, test_data = NULL, drop_nzv = FALSE, measure = c("accuracy", "sens", "spec", "ppv"), states = NULL, cv = FALSE, max_states = NULL, k = 2, actions = NULL, seed = NULL, popSize = 75, pcrossover = 0.8, pmutation = 0.1, maxiter = 50, run = 25, parallel = FALSE, priors = NULL, verbose = TRUE, return_best = TRUE, ntimes = 1)
data |
A |
test_data |
Optional |
drop_nzv |
Optional logical vector length one specifying whether
predictors variables with variance in provided data near zero should be
dropped before model building. Default is |
measure |
Optional length one character vector that is either:
"accuracy", "sens", "spec", or "ppv". This specifies what measure of
predictive performance to use for training and evaluating the model. The
default measure is |
states |
Optional numeric vector with the number of states.
If not provided, will be set to |
cv |
Optional logical vector length one for whether cross-validation
should be conducted on training data to select optimal number of states.
This can drastically increase computation time because if |
max_states |
Optional numeric vector length one only relevant if
|
k |
Optional numeric vector length one only relevant if cv==TRUE, specifying number of folds for cross-validation. |
actions |
Optional numeric vector with the number of actions. If not provided, then actions will be set as the number of unique values in the outcome vector. |
seed |
Optional numeric vector length one. |
popSize |
Optional numeric vector length one specifying the size of the GA population. A larger number will increase the probability of finding a very good solution but will also increase the computation time. This is passed to the GA::ga() function of the GA package. |
pcrossover |
Optional numeric vector length one specifying probability of crossover for GA. This is passed to the GA::ga() function of the GA package. |
pmutation |
Optional numeric vector length one specifying probability of mutation for GA. This is passed to the GA::ga() function of the GA package. |
maxiter |
Optional numeric vector length one specifying max number of
iterations for stopping the GA evolution. A larger number will increase the
probability of finding a very good solution but will also increase the
computation time. This is passed to the GA::ga() function of the GA
package. |
run |
Optional numeric vector length one specifying max number of consecutive iterations without improvement in best fitness score for stopping the GA evolution. A larger number will increase the probability of finding a very good solution but will also increase the computation time. This is passed to the GA::ga() function of the GA package. |
parallel |
Optional logical vector length one. For running the GA evolution in parallel. Depending on the number of cores registered and the memory on your machine, this can make the process much faster, but only works for Unix-based machines that can fork the processes. |
priors |
Optional numeric matrix of solutions strings to be included in the initialization. User needs to use a decoder function to translate prior decision models into bits and then provide them. If this is not specified, then random priors are automatically created. |
verbose |
Optional logical vector length one specifying whether helpful messages should be displayed on the user's console or not. |
return_best |
Optional logical vector length one specifying whether to return just the best model or all models. Only relevant if ntimes > 1. Default is TRUE. |
ntimes |
Optional integer vector length one specifying the number of times to estimate model. Default is 1 time. |
This is the main function of the datafsm package. It relies on the
GA package for genetic algorithm optimization. evolve_model
takes data on predictors and data on the outcome. It automatically creates a
fitness function that takes the data, an action vector evolve_model
generates, and a state matrix evolve_model
generates as input and
returns numeric vector of the same length as the outcome
.
evolve_model
then computes a fitness score for that potential solution
FSM by comparing it to the provided outcome
. This is repeated for every
FSM in the population and then the probability of selection for the next
generation is proportional to the fitness scores. The default is also for the
function to call itself recursively while varying the number of states inside
a cross-validation loop in order to estimate the optimal number of states.
If parallel is set to TRUE, then these evaluations are distributed across the
available processors of the computer using the doParallel package,
otherwise, the evaluations of fitness are conducted sequentially. Because
this fitness function that evolve_model
creates must loop through all
the data every time it is evaluated and we need to evaluate many possible
solution FSMs, the fitness function is implemented in C++ so it is very fast.
evolve_model
uses a stochastic meta-heuristic optimization routine to
estimate the parameters that define a FSM model. Generalized simulated
annealing, or tabu search could work, but they are more difficult to
parallelize. The current version uses the GA package's genetic
algorithm because GAs perform well in rugged search spaces to solve integer
optimization problems, are a natural complement to our binary string
representation of FSMs, and are easily parallelized.
This function evolves the models on training data and then, if a test set is provided, uses the best solution to make predictions on test data. Finally, the function returns the GA object and the decoded version of the best string in the population. See ga_fsm for the details of the slots (objects) that this type of object will have.
Returns an S4 object of class ga_fsm. See ga_fsm for the
details of the slots (objects) that this type of object will have and for
information on the methods that can be used to summarize the calling and
execution of evolve_model()
, including summary
, print
,
and plot
. Timing measurement is in seconds.
Luca Scrucca (2013). GA: A Package for Genetic Algorithms in R. Journal of Statistical Software, 53(4), 1-37. URL https://www.jstatsoft.org/v53/i04/.
## Not run: # Create data: cdata <- data.frame(period = rep(1:10, 1000), outcome = rep(1:2, 5000), my.decision1 = sample(1:0, 10000, TRUE), other.decision1 = sample(1:0, 10000, TRUE)) (res <- evolve_model(cdata, cv=FALSE)) summary(res) plot(res, action_label = c("C", "D")) library(GA) plot(estimation_details(res)) ## End(Not run) # In scripts, it can makes sense to set parallel to # 'as.logical(Sys.info()['sysname'] != 'Windows')'.
## Not run: # Create data: cdata <- data.frame(period = rep(1:10, 1000), outcome = rep(1:2, 5000), my.decision1 = sample(1:0, 10000, TRUE), other.decision1 = sample(1:0, 10000, TRUE)) (res <- evolve_model(cdata, cv=FALSE)) summary(res) plot(res, action_label = c("C", "D")) library(GA) plot(estimation_details(res)) ## End(Not run) # In scripts, it can makes sense to set parallel to # 'as.logical(Sys.info()['sysname'] != 'Windows')'.
evolve_model_cv
calls evolve_model
with varied numbers of
states and compares their performance with cross-validation.
evolve_model_cv(data, measure, k, actions, max_states, seed, popSize, pcrossover, pmutation, maxiter, run, parallel, verbose, ntimes)
evolve_model_cv(data, measure, k, actions, max_states, seed, popSize, pcrossover, pmutation, maxiter, run, parallel, verbose, ntimes)
data |
A |
measure |
Optional length one character vector that is either:
"accuracy", "sens", "spec", or "ppv". This specifies what measure of
predictive performance to use for training and evaluating the model. The
default measure is |
k |
Optional numeric vector length one only relevant if cv==TRUE, specifying number of folds for cross-validation. |
actions |
Optional numeric vector with the number of actions. If not provided, then actions will be set as the number of unique values in the outcome vector. |
max_states |
Optional numeric vector length one only relevant if
|
seed |
Optional numeric vector length one. |
popSize |
Optional numeric vector length one specifying the size of the GA population. A larger number will increase the probability of finding a very good solution but will also increase the computation time. This is passed to the GA::ga() function of the GA package. |
pcrossover |
Optional numeric vector length one specifying probability of crossover for GA. This is passed to the GA::ga() function of the GA package. |
pmutation |
Optional numeric vector length one specifying probability of mutation for GA. This is passed to the GA::ga() function of the GA package. |
maxiter |
Optional numeric vector length one specifying max number of
iterations for stopping the GA evolution. A larger number will increase the
probability of finding a very good solution but will also increase the
computation time. This is passed to the GA::ga() function of the GA
package. |
run |
Optional numeric vector length one specifying max number of consecutive iterations without improvement in best fitness score for stopping the GA evolution. A larger number will increase the probability of finding a very good solution but will also increase the computation time. This is passed to the GA::ga() function of the GA package. |
parallel |
Optional logical vector length one. For running the GA evolution in parallel. Depending on the number of cores registered and the memory on your machine, this can make the process much faster, but only works for Unix-based machines that can fork the processes. |
verbose |
Optional logical vector length one specifying whether helpful messages should be displayed on the user's console or not. |
ntimes |
Optional integer vector length one specifying the number of times to estimate model. Default is 1 time. |
Returns the number of states that maximizes the measure
, e.g.
accuracy.
Luca Scrucca (2013). GA: A Package for Genetic Algorithms in R. Journal of Statistical Software, 53(4), 1-37. URL https://www.jstatsoft.org/v53/i04/.
Hastie, T., R. Tibshirani, and J. Friedman. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition. 2nd ed. New York, NY: Springer.
evolve_model
uses a genetic algorithm to estimate a finite-state
machine model, primarily for understanding and predicting decision-making.
evolve_model_ntimes(data, test_data = NULL, drop_nzv = FALSE, measure = c("accuracy", "sens", "spec", "ppv"), states = NULL, cv = FALSE, max_states = NULL, k = 2, actions = NULL, seed = NULL, popSize = 75, pcrossover = 0.8, pmutation = 0.1, maxiter = 50, run = 25, parallel = FALSE, priors = NULL, verbose = TRUE, return_best = TRUE, ntimes = 10, cores = NULL)
evolve_model_ntimes(data, test_data = NULL, drop_nzv = FALSE, measure = c("accuracy", "sens", "spec", "ppv"), states = NULL, cv = FALSE, max_states = NULL, k = 2, actions = NULL, seed = NULL, popSize = 75, pcrossover = 0.8, pmutation = 0.1, maxiter = 50, run = 25, parallel = FALSE, priors = NULL, verbose = TRUE, return_best = TRUE, ntimes = 10, cores = NULL)
data |
A |
test_data |
Optional |
drop_nzv |
Optional logical vector length one specifying whether
predictors variables with variance in provided data near zero should be
dropped before model building. Default is |
measure |
Optional length one character vector that is either:
"accuracy", "sens", "spec", or "ppv". This specifies what measure of
predictive performance to use for training and evaluating the model. The
default measure is |
states |
Optional numeric vector with the number of states.
If not provided, will be set to |
cv |
Optional logical vector length one for whether cross-validation
should be conducted on training data to select optimal number of states.
This can drastically increase computation time because if |
max_states |
Optional numeric vector length one only relevant if
|
k |
Optional numeric vector length one only relevant if cv==TRUE, specifying number of folds for cross-validation. |
actions |
Optional numeric vector with the number of actions. If not provided, then actions will be set as the number of unique values in the outcome vector. |
seed |
Optional numeric vector length one. |
popSize |
Optional numeric vector length one specifying the size of the GA population. A larger number will increase the probability of finding a very good solution but will also increase the computation time. This is passed to the GA::ga() function of the GA package. |
pcrossover |
Optional numeric vector length one specifying probability of crossover for GA. This is passed to the GA::ga() function of the GA package. |
pmutation |
Optional numeric vector length one specifying probability of mutation for GA. This is passed to the GA::ga() function of the GA package. |
maxiter |
Optional numeric vector length one specifying max number of
iterations for stopping the GA evolution. A larger number will increase the
probability of finding a very good solution but will also increase the
computation time. This is passed to the GA::ga() function of the GA
package. |
run |
Optional numeric vector length one specifying max number of consecutive iterations without improvement in best fitness score for stopping the GA evolution. A larger number will increase the probability of finding a very good solution but will also increase the computation time. This is passed to the GA::ga() function of the GA package. |
parallel |
Optional logical vector length one. For running the GA evolution in parallel. Depending on the number of cores registered and the memory on your machine, this can make the process much faster, but only works for Unix-based machines that can fork the processes. |
priors |
Optional numeric matrix of solutions strings to be included in the initialization. User needs to use a decoder function to translate prior decision models into bits and then provide them. If this is not specified, then random priors are automatically created. |
verbose |
Optional logical vector length one specifying whether helpful messages should be displayed on the user's console or not. |
return_best |
Optional logical vector length one specifying whether to return just the best model or all models. Only relevant if ntimes > 1. Default is TRUE. |
ntimes |
Optional integer vector length one specifying the number of times to estimate model. Default is 1 time. |
cores |
integer vector length one specifying number of cores to use if parallel is TRUE. |
This function of the datafsm package applies the evolve_model
function multiple times and then returns a list with either all the models or
the best one.
evolve_model
uses a stochastic meta-heuristic optimization routine to
estimate the parameters that define a FSM model. Because this is not
guaranteed to return the best result, we run it many times.
Returns a list where each element is an S4 object of class ga_fsm. See
ga_fsm for the details of the slots (objects) that this type
of object will have and for information on the methods that can be used to
summarize the calling and execution of evolve_model()
, including
summary
, print
, and plot
.
## Not run: # Create data: cdata <- data.frame(period = rep(1:10, 1000), outcome = rep(1:2, 5000), my.decision1 = sample(1:0, 10000, TRUE), other.decision1 = sample(1:0, 10000, TRUE)) (res <- evolve_model_ntimes(cdata, ntimes=2)) (res <- evolve_model_ntimes(cdata, return_best = FALSE, ntimes=2)) ## End(Not run)
## Not run: # Create data: cdata <- data.frame(period = rep(1:10, 1000), outcome = rep(1:2, 5000), my.decision1 = sample(1:0, 10000, TRUE), other.decision1 = sample(1:0, 10000, TRUE)) (res <- evolve_model_ntimes(cdata, ntimes=2)) (res <- evolve_model_ntimes(cdata, return_best = FALSE, ntimes=2)) ## End(Not run)
find_wildcards
finds indices for non-identifiable elements of state
matrix.
find_wildcards(state_mat, action_vec, cols)
find_wildcards(state_mat, action_vec, cols)
state_mat |
Numeric matrix with rows as states and columns as predictors. |
action_vec |
Numeric vector indicating what action to take for each state. |
cols |
Numeric vector same length as number of columns of the
state matrix |
This is a helper function for degeneracy_check
.
Returns a list of indices (tuples specifying row and column of a matrix).
tft_state <- matrix(c(1, 1, 1, 1, 2, 2, 2, 2), 2, 4) tft_action <- matrix(c(1, 2)) find_wildcards(tft_state, tft_action, c(1, 2, 1, 2))
tft_state <- matrix(c(1, 1, 1, 1, 2, 2, 2, 2), 2, 4) tft_action <- matrix(c(1, 2)) find_wildcards(tft_state, tft_action, c(1, 2, 1, 2))
A generated action vector and state matrix are input and this function
returns a numeric vector of the same length as the outcome
.
evolve_model
then computes a fitness score for that potential
solution FSM by comparing it to the provided outcome
. This is
repeated for every FSM in the population and then the probability of
selection for the next generation is set to be proportional to the fitness
scores. This function is also used in the predict method for the resulting
final model that is returned. The function aborts if the user aborts in R,
checking every 1000 iterations.
fitnessCPP(action_vec, state_mat, covariates, period)
fitnessCPP(action_vec, state_mat, covariates, period)
action_vec |
Integer Vector. |
state_mat |
Integer Matrix. |
covariates |
Integer Matrix. |
period |
Integer Vector. |
evolve_model
.An S4 class to return the results of using a GA to estimate a FSM with
evolve_model
.
Turns ga_fsm S4 object into list of summaries for printing and then prints it.
Plots ga_fsm S4 object's state transition matrix
Plots ga_fsm S4 object's variable importances
Plots ga_fsm S4 object's variable importances
Extracts slot relevant to estimating the fsm
Extracts performance
Extracts slot of variable importances
Extracts slot of action_vec
Extracts number of states
Predicts new data with estimated model
## S4 method for signature 'ga_fsm' print(x, ...) ## S4 method for signature 'ga_fsm' show(object) ## S4 method for signature 'ga_fsm' summary(object, digits = 3) ## S4 method for signature 'ga_fsm,ANY' plot(x, y, maintitle = "Transition Diagram", action_label = NULL, transition_label = NULL, curvature = c(0.3, 0.6, 0.8)) ## S4 method for signature 'ga_fsm' barplot(height, ...) ## S4 method for signature 'ga_fsm' dotchart(x, labels) ## S4 method for signature 'ga_fsm' estimation_details(x) ## S4 method for signature 'ga_fsm' best_performance(x) ## S4 method for signature 'ga_fsm' varImp(x) ## S4 method for signature 'ga_fsm' action_vec(x) ## S4 method for signature 'ga_fsm' states(x) ## S4 method for signature 'ga_fsm' predict(object, data, type = "prob", na.action = stats::na.omit, ...)
## S4 method for signature 'ga_fsm' print(x, ...) ## S4 method for signature 'ga_fsm' show(object) ## S4 method for signature 'ga_fsm' summary(object, digits = 3) ## S4 method for signature 'ga_fsm,ANY' plot(x, y, maintitle = "Transition Diagram", action_label = NULL, transition_label = NULL, curvature = c(0.3, 0.6, 0.8)) ## S4 method for signature 'ga_fsm' barplot(height, ...) ## S4 method for signature 'ga_fsm' dotchart(x, labels) ## S4 method for signature 'ga_fsm' estimation_details(x) ## S4 method for signature 'ga_fsm' best_performance(x) ## S4 method for signature 'ga_fsm' varImp(x) ## S4 method for signature 'ga_fsm' action_vec(x) ## S4 method for signature 'ga_fsm' states(x) ## S4 method for signature 'ga_fsm' predict(object, data, type = "prob", na.action = stats::na.omit, ...)
x |
S4 ga_fsm object. @export |
... |
arguments to be passed to/from other methods. |
object |
S4 ga_fsm object |
digits |
Optional numeric vector length one for how many significant digits to print, default is 3. @export |
y |
not used. |
maintitle |
optional character vector |
action_label |
optional character vector same length as action vector, where each ith element corresponds to what that ith element in the action vector represents. This will be used to fill in the states (circles) of the state transition matrix to be plotted. |
transition_label |
optional character vector same length as number of columns of state transition matrix. |
curvature |
optional numeric vector specifying the curvature of the lines for a diagram of 2 or more states. |
height |
ga_fsm S4 object |
labels |
vector of labels for each point. For vectors the default is to use names(x) and for matrices the row labels dimnames(x)[[1]]. |
data |
A |
type |
Not currently used. |
na.action |
Optional function. |
print
: An S4 method for printing a ga_fsm S4 object
show
: An S4 method for showing a ga_fsm S4 object
summary
: An S4 method for summarizing a ga_fsm S4 object
plot
:
barplot
:
dotchart
: Plots ga_fsm S4 object's variable importances
estimation_details
: @export
best_performance
: @export
varImp
: @export
action_vec
: @export
states
: @export
predict
: Predicts new data with estimated model
call
Language from the call of the function evolve_model
.
actions
Numeric vector with the number of actions.
states
Numeric vector with the number of states.
GA
S4 object created by ga() from the GA package.
state_mat
Numeric matrix with rows as states and columns as predictors.
action_vec
Numeric vector indicating what action to take for each state.
predictive
Numeric vector of length one with test data accuracy if test data was supplied; otherwise, a character vector with a message that the user should provide test data for better estimate of performance.
varImp
Numeric vector same length as number of columns of state matrix with relative importance scores for each predictor.
varImp2
Numeric matrix same size as state matrix with relative importance scores for each transition.
timing
Numeric vector length one with the total elapsed seconds it took
evolve_model
to execute.
diagnostics
Character vector length one, to be printed with base::cat().
A dataset containing 168,386 total rounds of play in 30 different variations on the iterated prisoner's dilemma games. The data comes from J.J. Nay and Y. Vorobeychik, "Predicting Human Cooperation," PLOS ONE 11(5), e0155656 (2016).
NV_games
NV_games
A data frame with 168,386 rows and 51 variables:
Which turn of the given game
The player's move in this turn
Boolean variable: 1 indicates stochastic payoffs, 0 deterministic payoffs
Probability the game ends after each round
Normalized difference in payoff between both players cooperating and both defecting
Normalized difference in payoff between both players cooperating and the payoff for being a sucker (cooperating when the opponent defects)
Probability that the player's intended move is switched to the opposite move
Which dataset did this game come from: AM = Andreoni & Miller; BR = Bereby-Meyer & Roth; DB = Dal Bo; DF = Dal Bo & Frechette; DO = Duffy & Ochs; FO = Friedman & Oprea; FR = Fudenberg, Rand, & Dreber; and KS = Kunreuther, Silvasi, Bradlow & Small
The player's move in the previous turn
The player's move two turns ago
The player's move three turns ago
The player's move four turns ago
The player's move five turns ago
The player's move six turns ago
The player's move seven turns ago
The player's move eight turns ago
The player's move nine turns ago
The opponent's move in the previous turn
The opponent's move two turns ago
The opponent's move three turns ago
The opponent's move four turns ago
The opponent's move five turns ago
The opponent's move six turns ago
The opponent's move seven turns ago
The opponent's move eight turns ago
The opponent's move nine turns ago
The player's payoff in the previous turn
The player's payoff two turns ago
The player's payoff three turns ago
The player's payoff four turns ago
The player's payoff five turns ago
The player's payoff six turns ago
The player's payoff seven turns ago
The player's payoff eight turns ago
The player's payoff nine turns ago
The opponent's payoff in the previous turn
The opponent's payoff two turns ago
The opponent's payoff three turns ago
The opponent's payoff four turns ago
The opponent's payoff five turns ago
The opponent's payoff six turns ago
The opponent's payoff seven turns ago
The opponent's payoff eight turns ago
The opponent's payoff nine turns ago
Reward: payoff when both players cooperate
Temptation: payoff when player defects and opponent cooperates
Sucker: Payoff when player cooperates and opponent defects
Punishment: payoff when both players defect
Boolean: 1 indicates infinite game with probability delta of ending at each round; 0 indicates pre-determined number of rounds
Boolean: 1 indicates the game is played in continuous time; 0 indicates discrete rounds
Which group (version of the game) is being played?
doi:10.1371/journal.pone.0155656
performance
measures difference between predictions and data
performance(results, outcome, measure)
performance(results, outcome, measure)
results |
Numeric vector with predictions |
outcome |
Numeric vector same length as results with real data to compare to. |
measure |
Optional length one character vector that is either:
"accuracy", "sens", "spec", or "ppv". This specifies what measure of
predictive performance to use for training and evaluating the model. The
default measure is |
This is the function of the datafsm package used to measure the fsm model performance. It uses the caret package.
Returns a numeric vector length one.
Extracts number of states
states(x)
states(x)
x |
S4 ga_fsm object |
var_imp
calculates the importance of the covariates of the model.
var_imp(state_mat, action_vec, data, outcome, period, measure)
var_imp(state_mat, action_vec, data, outcome, period, measure)
state_mat |
Numeric matrix with rows as states and columns as predictors. |
action_vec |
Numeric vector indicating what action to take for each state. |
data |
Data frame that has "period" and "outcome" columns and rest of cols are predictors, ranging from one to three predictors. All of the (3-5 columns) should be named. |
outcome |
Numeric vector same length as the number of rows as data. |
period |
Numeric vector same length as the number of rows as data. |
measure |
Optional length one character vector that is either:
"accuracy", "sens", "spec", or "ppv". This specifies what measure of
predictive performance to use for training and evaluating the model. The
default measure is |
Takes the state matrix and action vector from an already evolved model and the fitness function and data used to evolve the model (or this could be test data), flips the values of each of the elements in the state matrix and measures the change in fitness (prediction of data) relative to the original model. Then these changes are summed across columns to provide the importance of each of the columns of the state matrix.
Numeric vector the same length as the number of columns of the provided state matrix (the number of predictors in the model) with relative importance scores for each predictor.
Extracts slot of variable importances
varImp(x)
varImp(x)
x |
S4 ga_fsm object |