A shiny based web app that uses ExPanDaR functionality for interactive data exploration. Designed for long-form panel data but works on simple cross-sectional data as well.
Usage
ExPanD(
df = NULL,
cs_id = NULL,
ts_id = NULL,
df_def = NULL,
var_def = NULL,
config_list = NULL,
title = "ExPanD - Explore your data!",
abstract = NULL,
df_name = deparse(substitute(df)),
long_def = TRUE,
factor_cutoff = 10L,
components = c(sample_selection = TRUE, subset_factor = TRUE, grouping = TRUE,
bar_chart = TRUE, missing_values = TRUE, udvars = TRUE, descriptive_table = TRUE,
histogram = TRUE, ext_obs = TRUE, by_group_bar_graph = TRUE, by_group_violin_graph =
TRUE, trend_graph = TRUE, quantile_trend_graph = TRUE, by_group_trend_graph = TRUE,
corrplot = TRUE, scatter_plot = TRUE, regression = TRUE),
html_blocks = NULL,
export_nb_option = FALSE,
save_settings_option = TRUE,
store_encrypted = FALSE,
key_phrase = "What a wonderful key",
debug = FALSE,
...
)
Arguments
- df
A data frame or a list of data frames containing the data that you want to explore. If NULL, ExPanD will start up with a file upload dialog.
- cs_id
A character vector containing the names of the variables that identify the cross-section in your data. If only
cs_id
and notts_id
is provided, the data is treated as cross-sectional, and only appropriate displays are included.df_def
overrides if provided.- ts_id
A character scalar identifying the name of the variable that identifies the time series in your data. The according variable needs to be coercible to an ordered vector. If you provide a time series indicator that already is an ordered vector, ExPanD will verify that it has the same levels for each data frame and throw an error otherwise. If
cs_id
andts_id
are not provided either directly of bydf_def
, the data is treated as cross-sectional,observations are identified by row names and only appropriate displays are included.df_def
overrides if provided.- df_def
An optional dataframe (or a list of dataframes) containing variable names, definitions and types. If NULL (the default) ExPanD uses
cs_id
andts_id
to identify the data structure and determines the variable types (factor, numeric, logical) based on the classes of the data. See the details section for further information.- var_def
If you specify here a dataframe containing variable names and variable definitions, ExPanD will use these on the provided sample(s) to create the analysis sample. See the details section for the structure of the
var_def
dataframe. If NULL (default) the sample(s) provided bydf
will be used as analysis sample(s) directly.- config_list
a list containing the startup configuration for ExPanD to display. Take a look at
data(ExPanD_config_russell_3000)
for the format. The easiest way to generate a config list is to customize the display within the app and then save the configuration locally.- title
the title to display in the shiny web app.
- abstract
An introductory text to display in the shiny web app. Needs to be formatted as clean HTML.
- df_name
A character string or a vector of character strings characterizing the dataframe(s) provided in
df
(will be used in the selection menu of the app)- long_def
If you set this to TRUE (default) and are providing a
var_def
then ExPanD will add the definitions of the used variables of the underlying dataframe to the definitions provided for the analysis sample to make these more informative to the user. If set to FALSE only the variable definitions provided in thevar_def
sample will be provided to the user.- factor_cutoff
ExPanD treats factors different from numerical variables. Factors are available for sub-sampling data and for certain plots. Each variable classified as such will be treated as a factor. In addition, ExPanD classifies all logical values and all numerical values with less or equal than
factor_cutoff
unique values as a factor.- components
A named logical vector indicating the components that you want ExPanD to generate and their order. See the function head of
ExpanD
for the list of available components. By default, all components are reported. You can also exclude selected components from the standard order by setting then toFALSE
. In addition, you can include an arbitrary number ofhtml_block
components. Each block will render clean HTML code as contained in thehtml_blocks
parameter below. This allows you to customize your ExPanD report.- html_blocks
A character vector containing the clean HTML code for each
html_block
that is included incomponents
.- export_nb_option
Do you want to give the user the option to download your data and an R notebook containing code for the analyses that
ExPanD
displays? Defaults toFALSE
.- save_settings_option
Do you want to give the user the option to save and/or load the settings of the ExPanD app to their local environment? Defaults to
TRUE
.- store_encrypted
Do you want the user-side saved config files to be encrypted? A security measure to avoid that users can inject arbitrary code in the config list. Probably a good idea when you are hosting sensitive data on a publicly available server.
- key_phrase
The key phrase to use for encryption. Change this from the default if you want to encrypt the config files.
- debug
Do you want ExPanD to echo some debug timing information to the console/log file and to store some diagnostics to the global environment? Probably not.
- ...
Additional parameters that are passed on to
runApp
.
Details
If you start ExPanD without any options, it will start with an upload
dialog so that the user (e.g., you) can upload a data file
for analysis. Supported formats are as provided
by the rio
package.
When you start ExPanD with a dataframe as the only parameter, it will assume the data to be cross-sectional and will use its row names as the cross-sectional identifier.
When you have panel data in long format, set the ts_id
and
cs_id
parameters to identify the variables that determine
the time series and cross-sectional dimensions.
If you provide variable definitions in df_def
and/or var_def
,
ExPanD displays these as tooltips in the descriptive table of the
ExPanD app. In this case, you need to identify the panel dimensions in the
variable definitions (see below).
When you provide more than one data frame in df
, make sure that all have
the same variables and variable types defined. If not, ExPanD will throw
an error. When you provide only one df_def
for multiple data frames,
df_def
will be recycled.
When you provide var_def
, ExPanD starts up in the "advanced mode". The
advanced mode uses (a) base sample(s) (the one(s) you provide via df
)
and the variable definitions in var_def
to generate an analysis
sample based on the active base sample. In the advanced mode, the app user
can delete variables from the analysis sample within the app.
A df_def
or var_def
dataframe can contain the following
variables
- "var_name"
Required: The names of the variables that are provided by the base sample or are to be calculated for the analysis sample
- "var_def"
Required: For a
var_def
data frame, the code that is passed to the data frame (grouped by cross-sectional units) in calls tomutate
as right hand side to calculate the respective variable. For adata_def
data frame, a string describing the nature of the variable.- "type"
Required: One of the strings "cs_id", "ts_id", "factor", "logical" or "numeric", indicating the type of the variable. Please note that at least one variable has to be assigned as a cross-sectional identifier ("cs_id") and exactly one variable that is coercible into an ordered factor has to be assigned as the time-series identifier ("ts_id").
- "can_be_na"
Optional: If included, then all variables with this value set to FALSE are required to be non missing in the data set. This reduces the number of observations. If missing, it defaults to being TRUE for all variables other than cs_id and ts_id.
Examples
if (FALSE) {
ExPanD()
# Use this if you want to read very large files via the file dialog
options(shiny.maxRequestSize = 1024^3)
ExPanD()
# Explore cross-sectional data
ExPanD(mtcars)
# Include the option to download notebook code and data
ExPanD(mtcars, export_nb_option = TRUE)
# Use ExPanD on long-form panel data
data(russell_3000)
ExPanD(russell_3000, c("coid", "coname"), "period")
ExPanD(russell_3000, df_def = russell_3000_data_def)
ExPanD(russell_3000, df_def = russell_3000_data_def,
components = c(ext_obs = T, descriptive_table = T, regression = T))
ExPanD(russell_3000, df_def = russell_3000_data_def,
components = c(missing_values = F, by_group_violin_graph = F))
ExPanD(russell_3000, df_def = russell_3000_data_def,
components = c(html_block = T, descriptive_table = T,
html_block = T, regression = T),
html_blocks = c(
paste('<div class="col-sm-2"><h3>HTML Block 1</h3></div>',
'<div class="col-sm-10">',
"<p></p>This is a condensed variant of ExPanD with two additional HTML Blocks.",
"</div>"),
paste('<div class="col-sm-2"><h3>HTML Block 2</h3></div>',
'<div class="col-sm-10">',
"It contains only the descriptive table and the regression component.",
"</div>")))
data(ExPanD_config_russell_3000)
ExPanD(df = russell_3000, df_def = russell_3000_data_def,
config_list = ExPanD_config_russell_3000)
exploratory_sample <- sample(nrow(russell_3000), round(0.5*nrow(russell_3000)))
test_sample <- setdiff(1:nrow(russell_3000), exploratory_sample)
ExPanD(df = list(russell_3000[exploratory_sample, ], russell_3000[test_sample, ]),
df_def = russell_3000_data_def,
df_name = c("Exploratory sample", "Test sample"))
ExPanD(worldbank, df_def = worldbank_data_def, var_def = worldbank_var_def,
config_list = ExPanD_config_worldbank)
}