diff --git a/.gitignore b/.gitignore index 8abee9d6d4cb733781f0d0cd2e1de4af266f56e7..9ad8d9892c69bbd9d6c184efe8f0778e81dacd0a 100644 --- a/.gitignore +++ b/.gitignore @@ -6,6 +6,7 @@ pyproject.toml # for testing dump.ipynb +Bullshit.py DUMP_ds.py diff --git a/Notebook.ipynb b/Notebook.ipynb index 82715d85b044e6c318736d1b694b48a3783e958b..05308c16ccd1a69645c41d5a5e7ce67118e1e13c 100644 --- a/Notebook.ipynb +++ b/Notebook.ipynb @@ -8,23 +8,27 @@ "\n", "# Overview \n", "\n", - "In this project we decided to analyze anxiety in Gamers. We picked the dataset from kaggle because it intersected our personal interests. The data can be found [here](https://www.kaggle.com/datasets/divyansh22/online-gaming-anxiety-data)\n", + "In this project we decided to analyze anxiety in Gamers. We picked the dataset from kaggle because it intersected our personal interests. The data and survey can be found [here](https://www.kaggle.com/datasets/divyansh22/online-gaming-anxiety-data)\n", "\n", - "The data was acquired by a survey published and shared online. This way everyone could participate. For us that also means analyzing and di\n", + "The data was acquired by a survey published and shared online. This way everyone could participate. For us that also means taking into account that the distribution and answers can be scewed. \n", "\n", - "## Motivation - Why " + "## Motivation - " ] }, { "cell_type": "code", - "execution_count": 17, + "execution_count": 1, "metadata": {}, "outputs": [ { - "name": "stdout", - "output_type": "stream", - "text": [ - "<src.Dataset.Dataset object at 0x000001906063BDC0>\n" + "ename": "ModuleNotFoundError", + "evalue": "No module named 'src.Dataset'", + "output_type": "error", + "traceback": [ + "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[1;31mModuleNotFoundError\u001b[0m Traceback (most recent call last)", + "Cell \u001b[1;32mIn[1], line 1\u001b[0m\n\u001b[1;32m----> 1\u001b[0m \u001b[39mfrom\u001b[39;00m \u001b[39msrc\u001b[39;00m\u001b[39m.\u001b[39;00m\u001b[39mDataset\u001b[39;00m \u001b[39mimport\u001b[39;00m Dataset \n\u001b[0;32m 3\u001b[0m dataset \u001b[39m=\u001b[39m Dataset(\u001b[39m\"\u001b[39m\u001b[39mdata\u001b[39m\u001b[39m\\\u001b[39m\u001b[39mGamingStudy_data.csv\u001b[39m\u001b[39m\"\u001b[39m)\n\u001b[0;32m 4\u001b[0m \u001b[39mprint\u001b[39m(dataset)\n", + "\u001b[1;31mModuleNotFoundError\u001b[0m: No module named 'src.Dataset'" ] } ], @@ -32,28 +36,23 @@ "from src.Dataset import Dataset \n", "\n", "dataset = Dataset(\"data\\GamingStudy_data.csv\")\n", - "print(dataset)" + "print(dataset)\n" ] }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [] - }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Data Exploration\n", "\n", - "\n", - "\n", "Because the data was accumulated in a semi-professional way for a pre-study we had to clean it up and make some changes. \n", "\n", "Some columns could be answered with an open text field. Naturally the answeres in those columns are very diversified and hard to analyze. \n", - "+ Example\n", - "+ Example\n", - "+ Example\n", + "\n", + "#### Affected Columns\n", + "+ Whyplay\n", + "+ Earnings \n", + "+ League\n", "\n", "In the following we will explain if and how we used these columns. \n", "\n", @@ -63,7 +62,7 @@ }, { "cell_type": "code", - "execution_count": 7, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -139,8 +138,7 @@ "A more detailed interpretation can be found [here](http://labs.psychology.illinois.edu/~ediener/Documents/Understanding%20SWLS%20Scores.pdf).\n", "\n", "Residents of developed nations (e.g. DE) usually score 20-24.\n", - "### Take it yourself\n", - "\n", + "#### Questions \n", "____ In most ways my life is close to my ideal.<br>\n", "____ The conditions of my life are excellent.<br>\n", "____ I am satisfied with my life.<br>\n", @@ -157,23 +155,46 @@ "# Analysis\n", "\n", "## Preprocessing \n", - "Explained new columns and why we did that (\"Is_narcissist, \"Anxiety_score\")\n", + "* Explained new columns and why we did that *\n", "\n", "Some columns gave the options to write individual responses. Naturally those are not useful in data analysis. In some cases we cleaned the columns and changes the unusual cases to \"Other\"/\"NA\"\n", "### Cleaned Columns\n", "+ \"Whyplay\" \n", - "+ \n", + "+ Accept \n", "## Normalizing the Data \n", "\n", - "+ \"Is_narcissist,\n", - "+ \"Anxiety_score\"" + "### \"Is_narcissist,\n", + "### \"Anxiety_score\"\n", + "### \"Is_competetive" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, - "outputs": [], + "outputs": [ + { + "data": { + "text/plain": [ + "0 0.202288\n", + "1 0.517320\n", + "2 0.497993\n", + "3 0.272969\n", + "4 0.533567\n", + " ... \n", + "13459 0.212092\n", + "13460 0.601914\n", + "13461 0.125210\n", + "13462 0.591783\n", + "13463 0.243231\n", + "Length: 13050, dtype: float64" + ] + }, + "execution_count": 23, + "metadata": {}, + "output_type": "execute_result" + } + ], "source": [ "# Executing and showing new columns \n", "dataset.get_combined_anxiety_score(dataset.get_dataframe())" @@ -197,7 +218,7 @@ }, { "cell_type": "code", - "execution_count": 23, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -222,7 +243,7 @@ }, { "cell_type": "code", - "execution_count": 24, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -243,7 +264,7 @@ }, { "cell_type": "code", - "execution_count": 25, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -264,7 +285,7 @@ }, { "cell_type": "code", - "execution_count": 26, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -280,7 +301,7 @@ }, { "cell_type": "code", - "execution_count": 27, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -297,7 +318,7 @@ }, { "cell_type": "code", - "execution_count": 28, + "execution_count": null, "metadata": {}, "outputs": [], "source": [ @@ -333,7 +354,7 @@ }, { "cell_type": "code", - "execution_count": 29, + "execution_count": null, "metadata": {}, "outputs": [], "source": [