src package¶
Submodules¶
src.Dataset module¶
- class src.Dataset.Dataset(dataset_filename: str)¶
Bases:
object
- filtered_rows(colname: str, criteria: str) DataFrame ¶
Return a filtered dataframe where colname == criteria.
- Parameters:
colname (str) – Column including the value you want to filter by.
criteria (str) – criteria that matches all the rows you want to keep
- Returns:
the filtered dataframe.
- Return type:
pd.DataFrame
- get_category_counts(colname: str, ascending: bool | None = None) Series ¶
Returns a count of categorical values in the dataset.
- Parameters:
colname (str) – the column name.
ascending (bool, optional) – Direction to sort results. If set to None, the results are not sorted. Defaults to None.
- Returns:
the counted categories.
- Return type:
pd.Series
- get_column_count() int ¶
Returns the amount of columns in the dataframe.
- Returns:
number of columns.
- Return type:
int
- get_column_mean(colname: str) int ¶
Returns the mean value of all entries in one column.
- Parameters:
colname (str) – Index of the columns in the dataframe.
get_columns ((Indexes can be obtained by calling) –
- Returns:
mean of colname.
- Return type:
int
- get_columns() Index ¶
Returns all column headers/indexes.
- Returns:
List of all column headers.
- Return type:
pd.core.indexes.base.Index
- get_combined_anxiety_score(dataframe: DataFrame) Series ¶
Get the combined anxiety score, as a column. This score is based on the GAN, SPIN and SWL metrics. Each of the three columns are first normalised, then the mean is returned.
- Parameters:
dataframe (pd.DataFrame) – the dataframe.
- Raises:
ValueError – if dataframe is not a pd.DataFrame.
- Returns:
the anxiety score column.
- Return type:
pd.Series
- get_dataframe() DataFrame ¶
A getter function for the dataframe.
- Returns:
the dataset.
- Return type:
pd.DataFrame
- get_is_competitive_col(dataframe: DataFrame) array ¶
Returns a column defining whether a person is competitive or not.
- Parameters:
dataframe (pd.DataFrame) – the dataframe.
- Raises:
ValueError – if dataframe is not a pd.DataFrame.
- Returns:
the resulting column.
- Return type:
np.array
- get_is_narcissist_col(dataframe: DataFrame) Series ¶
Get a boolean narcissist column. The Narcissism score of 1.0 is considered Not a Narcissist, while all values above that are above are considered Narcissist.
- Parameters:
dataframe (pd.DataFrame) – the dataframe
- Raises:
ValueError – if dataframe is not a pd.DataFrame.
- Returns:
the boolean narcissist column.
- Return type:
pd.Series
- get_row_count() int ¶
Returns the amount of rows in the dataframe.
- Returns:
number of rows.
- Return type:
int
- get_sorted_column(colname: str, ascending: bool = True) Series ¶
Returns a single column, sorted either ascending or descending.
- Parameters:
colname (str) – the column name (see get_dataset_columns()).
ascending (bool, optional) – Sorting order. Defaults to True.
- Returns:
The sorted column.
- Return type:
pd.Series
- get_unique_column_values(colname: str)¶
Returns a count of categorical values in the dataset.
- Parameters:
colname (str) – the column name.
- Returns:
an array of strings containing the unique values present in the column
- Return type:
string array
- preprocess_dataset(raw_dataframe: DataFrame) DataFrame ¶
preprocess dataframe immediately after loading it.
- Parameters:
raw_dataframe (pd.DataFrame) – raw dataframe as read from pd.read_csv(). This dataframe is discarded afterwards.
- Raises:
ValueError – if raw_dataframe is not a pd.DataFrame.
- Returns:
resulting preprocessed dataframe.
- Return type:
pd.DataFrame
- preprocess_whyplay(dataframe: DataFrame) Series ¶
Preprocesses the whyplay column, and returns a Is_competitive col.
- Parameters:
dataframe (pd.DataFrame) – the dataframe.
- Raises:
ValueError – if dataframe is not a pd.DataFrame.
- Returns:
the Is_competitive column.
- Return type:
pd.Series
- remove_nonaccepting_rows(dataframe: DataFrame) DataFrame ¶
Removes rows where participants did not consent to data processing.
- Parameters:
dataframe (pd.DataFrame) – the dataframe.
- Raises:
ValueError – if dataframe is not a pd.DataFrame.
- Returns:
the dataframe.
- Return type:
pd.DataFrame
- treat_outliers(df: DataFrame, colname: str) DataFrame ¶
Treat outliers of numerical columns.
- Parameters:
df (pd.DataFrame) – the dataframe.
colname (str) – the column name to treat.
- Returns:
the filtered dataframe.
- Return type:
pd.DataFrame
src.Plotter module¶
- class src.Plotter.Plotter(dataset: Dataset)¶
Bases:
object
- customize_plot(fig, ax, styling_params) None ¶
- Parameters:
fig (plt.figure.Figure) –
ax (plt.axes.Axes) –
styling_params (dict) –
- Returns:
None
- distribution_plot(target, styling_params={}) None ¶
plot a distribution plot.
- Parameters:
target (str, must be present as a column in the dataset) –
styling_params (dict) –
- Returns:
None
- plot_categorical_bar_chart(category1, category2, styling_params={}) None ¶
plot a categorical bar chart.
- Parameters:
category1 (str, must be present as a column in the dataset) –
category2 (str, must be present as a column in the dataset) –
styling_params (dict) –
- Returns:
None
- plot_categorical_boxplot(target, category, styling_params={}) None ¶
plot a categorical boxplot.
- Parameters:
target (str, must be present as a column in the dataset) –
category (str, must be present as a column in the dataset) –
styling_params (dict) –
- Returns:
None
- plot_categorical_histplot(target, category, styling_params={}, bins=30) None ¶
plot a categorical hisplot.
- Parameters:
target (str, must be present as a column in the dataset) –
category (str, must be present as a column in the dataset) –
styling_params (dict) –
- Returns:
None
- plot_scatterplot(target1, target2, styling_params={}) None ¶
plot a scatterplot.
- Parameters:
target1 (str, must be present as a column in the dataset) –
target2 (str, must be present as a column in the dataset) –
styling_params (dict) –
- Returns:
None
src.test_dataset module¶
This test file tests the Dataset class in Dataset.py.
- src.test_dataset.test_bool_or_none_params(the_dataset: Dataset, param)¶
Tests that functions that take bool or None correctly work as intended.
- src.test_dataset.test_catch_colname_not_in_df(the_dataset: Dataset)¶
Tests that functions that take colname correctly catch colnames not in dataset.
- src.test_dataset.test_catch_colname_not_string(the_dataset: Dataset)¶
Tests that functions that take colname correctly catch colnames not in dataset.
- src.test_dataset.test_catch_non_bool(the_dataset: Dataset, param)¶
Tests that functions that take bool or None correctly catch incorrect input data types.
- src.test_dataset.test_catch_non_dataframe(the_dataset: Dataset, param)¶
Tests that functions that take pd.DataFrame correctly catch incorrect input data types.
- src.test_dataset.test_combined_anxiety_score(the_dataset: Dataset)¶
Tests Dataset.get_combined_anxiety_score().
- src.test_dataset.test_filtered_rows(the_dataset: Dataset)¶
Tests that filtered_rows works correctly.
- src.test_dataset.test_get_column_count(the_dataset: Dataset)¶
Tests that get_column_count works correctly.
- src.test_dataset.test_get_column_mean(the_dataset: Dataset)¶
Tests that get_column_mean works correctly.
- src.test_dataset.test_get_is_narcissist_col(the_dataset: Dataset)¶
Tests Dataset.get_is_narcissist_col().
- src.test_dataset.test_get_row_count(the_dataset: Dataset)¶
Tests that get_row_count works correctly.
- src.test_dataset.test_get_unique_column_values(the_dataset: Dataset)¶
Tests Dataset.get_combined_anxiety_score().
- src.test_dataset.test_incorrectly_load_Dataset_class()¶
Tests the Dataset init function.
- src.test_dataset.test_load_Dataset_class()¶
Tests if the dataset is successfully loaded.
src.test_plotter module¶
- src.test_plotter.test_catch_colname_not_in_df(the_plotter: Plotter)¶
Tests that functions that take colname correctly catch colnames not in dataset.
- src.test_plotter.test_catch_plotter_init_not_Dataset()¶
Tests that the Plotter’s init actually takes a src.Dataset.Dataset.
- src.test_plotter.test_catch_styling_params_not_dict(the_plotter: Plotter, param)¶
Tests that functions that take styling_params correctly catch non dictionaries.
- src.test_plotter.test_catch_target_not_string(the_plotter: Plotter)¶
Tests that functions that take target correctly catch non strings.
- src.test_plotter.test_distribution_plot(the_plotter: Plotter)¶
Tests the distribution_plot() function.
- src.test_plotter.test_load_plotter()¶
Tests that the Plotter class can be loaded.
- src.test_plotter.test_plot_categorical_bar_chart(the_plotter: Plotter)¶
Tests the plot_categorical_bar_chart() function.
- src.test_plotter.test_plot_categorical_boxplot(the_plotter: Plotter)¶
Tests the plot_categorical_boxplot() function.
- src.test_plotter.test_plot_categorical_histplot(the_plotter: Plotter)¶
Tests the plot_categorical_histplot() function.