src package

Submodules

src.Dataset module

class src.Dataset.Dataset(dataset_filename: str)

Bases: object

get_category_counts(colname: str, ascending: bool | None = None) Series

Returns a count of categorical values in the dataset.

Parameters:
  • colname (str) – the column name.

  • ascending (bool, optional) – Direction to sort results. If set to None, the results are not sorted. Defaults to None.

Returns:

the counted categories.

Return type:

pd.Series

get_combined_anxiety_score(dataframe: DataFrame) Series

Get the combined anxiety score, as a column. This score is based on the GAN, SPIN and SWL metrics. Each of the three columns are first normalised, then the mean is returned.

Parameters:

dataframe (pd.DataFrame) – the dataframe.

Raises:

ValueError – if dataframe is not a pd.DataFrame.

Returns:

the anxiety score column.

Return type:

pd.Series

get_dataframe() DataFrame

A getter function for the dataframe.

Returns:

the dataset.

Return type:

pd.DataFrame

get_is_competitive_col(dataframe: DataFrame) array

Returns a column defining whether a person is competitive or not.

Parameters:

dataframe (pd.DataFrame) – the dataframe.

Raises:

ValueError – if dataframe is not a pd.DataFrame.

Returns:

the resulting column.

Return type:

np.array

get_is_narcissist_col(dataframe: DataFrame) Series

Get a boolean narcissist column. The Narcissism score of 1.0 is considered Not a Narcissist, while all values above that are above are considered Narcissist.

Parameters:

dataframe (pd.DataFrame) – the dataframe

Raises:

ValueError – if dataframe is not a pd.DataFrame.

Returns:

the boolean narcissist column.

Return type:

pd.Series

get_sorted_column(colname: str, ascending: bool = True) Series

Returns a single column, sorted either ascending or descending.

Parameters:
  • colname (str) – the column name (see get_dataset_columns()).

  • ascending (bool, optional) – Sorting order. Defaults to True.

Returns:

The sorted column.

Return type:

pd.Series

get_unique_column_values(colname: str)

Returns a count of categorical values in the dataset.

Parameters:

colname (str) – the column name.

Returns:

an array of strings containing the unique values present in the column

Return type:

string array

preprocess_dataset(raw_dataframe: DataFrame) DataFrame

preprocess dataframe immediately after loading it.

Parameters:

raw_dataframe (pd.DataFrame) – raw dataframe as read from pd.read_csv(). This dataframe is discarded afterwards.

Raises:

ValueError – if raw_dataframe is not a pd.DataFrame.

Returns:

resulting preprocessed dataframe.

Return type:

pd.DataFrame

preprocess_whyplay(dataframe: DataFrame) Series

Preprocesses the whyplay column, and returns a Is_competitive col.

Parameters:

dataframe (pd.DataFrame) – the dataframe.

Raises:

ValueError – if dataframe is not a pd.DataFrame.

Returns:

the Is_competitive column.

Return type:

pd.Series

remove_nonaccepting_rows(dataframe: DataFrame) DataFrame

Removes rows where participants did not consent to data processing.

Parameters:

dataframe (pd.DataFrame) – the dataframe.

Raises:

ValueError – if dataframe is not a pd.DataFrame.

Returns:

the dataframe.

Return type:

pd.DataFrame

treat_outliers(df: DataFrame, colname: str) DataFrame

Treat outliers of numerical columns.

Parameters:
  • df (pd.DataFrame) – the dataframe.

  • colname (str) – the column name to treat.

Returns:

the filtered dataframe.

Return type:

pd.DataFrame

src.Plotter module

class src.Plotter.Plotter(dataset: Dataset)

Bases: object

customize_plot(fig, ax, styling_params) None
Parameters:
  • fig (plt.figure.Figure) –

  • ax (plt.axes.Axes) –

  • styling_params (dict) –

Returns:

None

distribution_plot(target, styling_params={}) None

plot a distribution plot.

Parameters:
  • target (str, must be present as a column in the dataset) –

  • styling_params (dict) –

Returns:

None

plot_categorical_bar_chart(category1, category2, styling_params={}) None

plot a categorical bar chart.

Parameters:
  • category1 (str, must be present as a column in the dataset) –

  • category2 (str, must be present as a column in the dataset) –

  • styling_params (dict) –

Returns:

None

plot_categorical_boxplot(target, category, styling_params={}) None

plot a categorical boxplot.

Parameters:
  • target (str, must be present as a column in the dataset) –

  • category (str, must be present as a column in the dataset) –

  • styling_params (dict) –

Returns:

None

plot_categorical_histplot(target, category, styling_params={}, bins=30) None

plot a categorical hisplot.

Parameters:
  • target (str, must be present as a column in the dataset) –

  • category (str, must be present as a column in the dataset) –

  • styling_params (dict) –

Returns:

None

plot_scatterplot(target1, target2, styling_params={}) None

plot a scatterplot.

Parameters:
  • target1 (str, must be present as a column in the dataset) –

  • target2 (str, must be present as a column in the dataset) –

  • styling_params (dict) –

Returns:

None

src.test_dataset module

This test file tests the Dataset class in Dataset.py.

src.test_dataset.test_bool_or_none_params(the_dataset: Dataset, param)

Tests that functions that take bool or None correctly work as intended.

src.test_dataset.test_catch_colname_not_in_df(the_dataset: Dataset)

Tests that functions that take colname correctly catch colnames not in dataset.

src.test_dataset.test_catch_colname_not_string(the_dataset: Dataset)

Tests that functions that take colname correctly catch colnames not in dataset.

src.test_dataset.test_catch_non_bool(the_dataset: Dataset, param)

Tests that functions that take bool or None correctly catch incorrect input data types.

src.test_dataset.test_catch_non_dataframe(the_dataset: Dataset, param)

Tests that functions that take pd.DataFrame correctly catch incorrect input data types.

src.test_dataset.test_combined_anxiety_score(the_dataset: Dataset)

Tests Dataset.get_combined_anxiety_score().

src.test_dataset.test_get_dataframe(the_dataset: Dataset)

Tests Dataset.get_dataframe().

src.test_dataset.test_get_is_narcissist_col(the_dataset: Dataset)

Tests Dataset.get_is_narcissist_col().

src.test_dataset.test_get_sorted_columns(the_dataset: Dataset)

Tests Dataset.get_sorted_column().

src.test_dataset.test_get_unique_column_values(the_dataset: Dataset)

Tests Dataset.get_combined_anxiety_score().

src.test_dataset.test_incorrectly_load_Dataset_class()

Tests the Dataset init function.

src.test_dataset.test_load_Dataset_class()

Tests if the dataset is successfully loaded.

src.test_dataset.test_preprocessed_dataframe(the_dataset: Dataset)

Tests that the dataframe is preprocessed correctly.

src.test_dataset.the_dataset() Dataset

Returns the initialised Dataset instance as a fixture.

Returns:

the initialised Dataset.

Return type:

Dataset

src.test_plotter module

src.test_plotter.test_catch_colname_not_in_df(the_plotter: Plotter)

Tests that functions that take colname correctly catch colnames not in dataset.

src.test_plotter.test_catch_plotter_init_not_Dataset()

Tests that the Plotter’s init actually takes a src.Dataset.Dataset.

src.test_plotter.test_catch_styling_params_not_dict(the_plotter: Plotter, param)

Tests that functions that take styling_params correctly catch non dictionaries.

src.test_plotter.test_catch_target_not_string(the_plotter: Plotter)

Tests that functions that take target correctly catch non strings.

src.test_plotter.test_customize_plot(the_plotter: Plotter)

Tests the customize_plot() function.

src.test_plotter.test_distribution_plot(the_plotter: Plotter)

Tests the distribution_plot() function.

src.test_plotter.test_load_plotter()

Tests that the Plotter class can be loaded.

src.test_plotter.test_plot_categorical_bar_chart(the_plotter: Plotter)

Tests the plot_categorical_bar_chart() function.

src.test_plotter.test_plot_categorical_boxplot(the_plotter: Plotter)

Tests the plot_categorical_boxplot() function.

src.test_plotter.test_plot_categorical_histplot(the_plotter: Plotter)

Tests the plot_categorical_histplot() function.

src.test_plotter.test_plot_scatterplot(the_plotter: Plotter)

Tests the plot_scatterplot() function.

src.test_plotter.the_plotter() Plotter

Returns the initialised Plotter instance as a fixture.

Returns:

the plotter.

Return type:

Plotter

Module contents