Notebooks from applied-cs/data-science@e451096c

38e33102 · Christof Kaufmann · b88d83ec · 38e33102 · 38e33102 · 38e33102
Commit 38e33102 authored Jun 19, 2024 by Christof Kaufmann
--- a/08-korrelation-und-dimensionsreduktion/01-kovarianz-sol.ipynb
+++ b/08-korrelation-und-dimensionsreduktion/01-kovarianz-sol.ipynb
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Kovarianz\n",
+    "\n",
+    "## Berechnen\n",
+    "\n",
+    "Schreiben Sie eine Funktion `cov`, die zwei Arrays `x` und `y` erhält\n",
+    "und die Kovarianz zurück gibt."
+   ],
+   "id": "0002-0913d59b142f97e01b1f1d202696d848ab5b6e755f3d92df06ee693365f"
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "style": "python"
+   },
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "import pandas as pd"
+   ],
+   "id": "0003-a69527b239a033f8134e8f47e334b1f9486b9076b43570d3fc203164ed0"
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Lösung\n",
+    "\n",
+    "Wir ermitteln zunächst die zentrierten x und y-Werte. Dann\n",
+    "multiplizieren wir sie und bilden den Mittelwert."
+   ],
+   "id": "0005-016214def42f652d79a75899e99541de5f9661c91938dffca7490fcc347"
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "style": "python"
+   },
+   "outputs": [],
+   "source": [
+    "def cov(x, y):\n",
+    "    x0 = x - np.mean(x)\n",
+    "    y0 = y - np.mean(y)\n",
+    "    return np.mean(x0 * y0)"
+   ],
+   "id": "0006-fadbee38c18d669a38b12077868f7669a98636383c5eb499c50a3a57b9d"
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Tests\n",
+    "\n",
+    "Testen Sie die Funktion mit den Beispielen aus den Folien:"
+   ],
+   "id": "0008-1839d17e4e89b5f286b8ba411595b6eadeed6664573c0cae88cf6869b85"
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "output_type": "stream",
+     "name": "stdout",
+     "text": [
+      "54.875"
+     ]
+    }
+   ],
+   "source": [
+    "df1 = pd.DataFrame({'x': [10, 7, 5, -13, -8, -3, -1, 11], 'y': [8, 10, 3, -9, -1, -7, -1, 13]})\n",
+    "df2 = df1 * [1, -1]\n",
+    "df3 = pd.DataFrame({'x': [14, 10, 4, 2, -1, -1, -9, -11], 'y': [3, 0, 10, -9, 13, -6, 7, -2]})\n",
+    "cov(df1.x, df1.y)"
+   ],
+   "id": "0009-5a9afeeb069c9adcfd985dfa866647c55d0b7afa511f1ab88ce0e3feda9"
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "output_type": "stream",
+     "name": "stdout",
+     "text": [
+      "-54.875"
+     ]
+    }
+   ],
+   "source": [
+    "cov(df2.x, df2.y)"
+   ],
+   "id": "0010-bcaedf82ed0c414209d4e126ee2fd2013a7f1cd96fb948355bf543c2ce7"
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "output_type": "stream",
+     "name": "stdout",
+     "text": [
+      "0.0"
+     ]
+    }
+   ],
+   "source": [
+    "cov(df3.x, df3.y)"
+   ],
+   "id": "0011-5530793defbb6ce07fafad4901e5575defa5e0894f0d8b8d206ff56113e"
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Erzeugen\n",
+    "\n",
+    "So weit, so gut. Nun drehen wir die Aufgabenstellung um. Versuchen Sie\n",
+    "nun drei Datensätze aus jeweils zwei Samples zu erzeugen:\n",
+    "\n",
+    "**Datensatz 1** soll als Kovarianz 1 besitzen.\n",
+    "\n",
+    "### Lösung\n",
+    "\n",
+    "Ein einfaches Beispiel für zwei Samples mit Kovarianz 1 wäre (0, 0), (2,\n",
+    "2). Der Mittelwert ist (1, 1), somit rechnen wir\n",
+    "$\\frac 1 2 (1 \\cdot 1 + 1 \\cdot 1) = 1$."
+   ],
+   "id": "0016-76dfc0cb6810a254ca36daaf0b86fc5e15a5c807be4f17adf9ff053cfb6"
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "x = np.array([0, 2])\n",
+    "y = np.array([0, 2])"
+   ],
+   "id": "0017-98190ce71c2abdc19eb7d8ac4354319f0d6acd8d42d0877945777483cda"
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Tests"
+   ],
+   "id": "0018-04ac3f4882b3309363456c952768f31a2c708e671a5462093ff68e76a76"
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "output_type": "stream",
+     "name": "stdout",
+     "text": [
+      "1.0"
+     ]
+    }
+   ],
+   "source": [
+    "cov(x, y)"
+   ],
+   "id": "0019-3ebb3905554befd6e9c90a8161f00000757c58b10e4f248e60f7e1abf9e"
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**Datensatz 2** soll auch als Kovarianz 1 besitzen, aber mit anderen\n",
+    "Samples als zuvor.\n",
+    "\n",
+    "### Lösung\n",
+    "\n",
+    "Aber die Samples müssen dafür keine Diagonale bilden (Quadrate), sondern\n",
+    "können auch anders liegen (Rechtecke). Wir wählen (0, 0), (1, 4). Der\n",
+    "Mittelwert ist (1, 1), somit rechnen wir\n",
+    "$\\frac 1 2 (\\frac 1 2 \\cdot 2 + \\frac 1 2 \\cdot 2) = 1$."
+   ],
+   "id": "0022-90b5a593898934ee35ed76b5e931784de71996d0e0bf7d5a659d58ea41a"
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "x = np.array([0, 1])\n",
+    "y = np.array([0, 4])"
+   ],
+   "id": "0023-9ec0433f3ab4acfa7459dc2d56714459dbcbd9078ce19baf91ffb8aa17d"
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Tests"
+   ],
+   "id": "0024-503fa0c97971fa858f825b273ceb998780354d5bb183c47acfd4f3507cf"
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "output_type": "stream",
+     "name": "stdout",
+     "text": [
+      "1.0"
+     ]
+    }
+   ],
+   "source": [
+    "cov(x, y)"
+   ],
+   "id": "0025-3ebb3905554befd6e9c90a8161f00000757c58b10e4f248e60f7e1abf9e"
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**Datensatz 3** soll als Kovarianz 4 besitzen.\n",
+    "\n",
+    "### Lösung\n",
+    "\n",
+    "Wenn wir die Werte im ersten Beispiel verdoppeln, vervierfacht sich das\n",
+    "Ergebnis. Mit (0, 0), (4, 4) ist die Kovarianz\n",
+    "$\\frac 1 2 (2 \\cdot 2 + 2 \\cdot 2) = 4$."
+   ],
+   "id": "0028-338b5930bf4bafa415d05676a139dfbe8cbce3d950085861e1a4061aa8b"
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "x = np.array([0, 4])\n",
+    "y = np.array([0, 4])"
+   ],
+   "id": "0029-402124921fd7c721bfaa138f25c8d143eb13fe2aa271ad7473b7126b0ab"
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Tests"
+   ],
+   "id": "0030-c36e6e0be7ad3f1dec0afd72fde3a265396ca90bf93c258dbd4e3346cbd"
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "output_type": "stream",
+     "name": "stdout",
+     "text": [
+      "4.0"
+     ]
+    }
+   ],
+   "source": [
+    "cov(x, y)"
+   ],
+   "id": "0031-3ebb3905554befd6e9c90a8161f00000757c58b10e4f248e60f7e1abf9e"
+  }
+ ],
+ "nbformat": 4,
+ "nbformat_minor": 5,
+ "metadata": {}
+}
+%% Cell type:markdown id:0002-0913d59b142f97e01b1f1d202696d848ab5b6e755f3d92df06ee693365f tags:
+# Kovarianz
+## Berechnen
+Schreiben Sie eine Funktion `cov`, die zwei Arrays `x` und `y` erhält
+und die Kovarianz zurück gibt.
+%% Cell type:code id:0003-a69527b239a033f8134e8f47e334b1f9486b9076b43570d3fc203164ed0 tags:
+``` 
+import numpy as np
+import pandas as pd
+```
+%% Cell type:markdown id:0005-016214def42f652d79a75899e99541de5f9661c91938dffca7490fcc347 tags:
+### Lösung
+Wir ermitteln zunächst die zentrierten x und y-Werte. Dann
+multiplizieren wir sie und bilden den Mittelwert.
+%% Cell type:code id:0006-fadbee38c18d669a38b12077868f7669a98636383c5eb499c50a3a57b9d tags:
+``` 
+def cov(x, y):
+    x0 = x - np.mean(x)
+    y0 = y - np.mean(y)
+    return np.mean(x0 * y0)
+```
+%% Cell type:markdown id:0008-1839d17e4e89b5f286b8ba411595b6eadeed6664573c0cae88cf6869b85 tags:
+### Tests
+Testen Sie die Funktion mit den Beispielen aus den Folien:
+%% Cell type:code id:0009-5a9afeeb069c9adcfd985dfa866647c55d0b7afa511f1ab88ce0e3feda9 tags:
+``` 
+df1 = pd.DataFrame({'x': [10, 7, 5, -13, -8, -3, -1, 11], 'y': [8, 10, 3, -9, -1, -7, -1, 13]})
+df2 = df1 * [1, -1]
+df3 = pd.DataFrame({'x': [14, 10, 4, 2, -1, -1, -9, -11], 'y': [3, 0, 10, -9, 13, -6, 7, -2]})
+cov(df1.x, df1.y)
+```
+%% Output
+    54.875
+%% Cell type:code id:0010-bcaedf82ed0c414209d4e126ee2fd2013a7f1cd96fb948355bf543c2ce7 tags:
+``` 
+cov(df2.x, df2.y)
+```
+%% Output
+    -54.875
+%% Cell type:code id:0011-5530793defbb6ce07fafad4901e5575defa5e0894f0d8b8d206ff56113e tags:
+``` 
+cov(df3.x, df3.y)
+```
+%% Output
+    0.0
+%% Cell type:markdown id:0016-76dfc0cb6810a254ca36daaf0b86fc5e15a5c807be4f17adf9ff053cfb6 tags:
+## Erzeugen
+So weit, so gut. Nun drehen wir die Aufgabenstellung um. Versuchen Sie
+nun drei Datensätze aus jeweils zwei Samples zu erzeugen:
+**Datensatz 1** soll als Kovarianz 1 besitzen.
+### Lösung
+Ein einfaches Beispiel für zwei Samples mit Kovarianz 1 wäre (0, 0), (2,
+2). Der Mittelwert ist (1, 1), somit rechnen wir
+$\frac 1 2 (1 \cdot 1 + 1 \cdot 1) = 1$.
+%% Cell type:code id:0017-98190ce71c2abdc19eb7d8ac4354319f0d6acd8d42d0877945777483cda tags:
+``` 
+x = np.array([0, 2])
+y = np.array([0, 2])
+```
+%% Cell type:markdown id:0018-04ac3f4882b3309363456c952768f31a2c708e671a5462093ff68e76a76 tags:
+### Tests
+%% Cell type:code id:0019-3ebb3905554befd6e9c90a8161f00000757c58b10e4f248e60f7e1abf9e tags:
+``` 
+cov(x, y)
+```
+%% Output
+    1.0
+%% Cell type:markdown id:0022-90b5a593898934ee35ed76b5e931784de71996d0e0bf7d5a659d58ea41a tags:
+**Datensatz 2** soll auch als Kovarianz 1 besitzen, aber mit anderen
+Samples als zuvor.
+### Lösung
+Aber die Samples müssen dafür keine Diagonale bilden (Quadrate), sondern
+können auch anders liegen (Rechtecke). Wir wählen (0, 0), (1, 4). Der
+Mittelwert ist (1, 1), somit rechnen wir
+$\frac 1 2 (\frac 1 2 \cdot 2 + \frac 1 2 \cdot 2) = 1$.
+%% Cell type:code id:0023-9ec0433f3ab4acfa7459dc2d56714459dbcbd9078ce19baf91ffb8aa17d tags:
+``` 
+x = np.array([0, 1])
+y = np.array([0, 4])
+```
+%% Cell type:markdown id:0024-503fa0c97971fa858f825b273ceb998780354d5bb183c47acfd4f3507cf tags:
+### Tests
+%% Cell type:code id:0025-3ebb3905554befd6e9c90a8161f00000757c58b10e4f248e60f7e1abf9e tags:
+``` 
+cov(x, y)
+```
+%% Output
+    1.0
+%% Cell type:markdown id:0028-338b5930bf4bafa415d05676a139dfbe8cbce3d950085861e1a4061aa8b tags:
+**Datensatz 3** soll als Kovarianz 4 besitzen.
+### Lösung
+Wenn wir die Werte im ersten Beispiel verdoppeln, vervierfacht sich das
+Ergebnis. Mit (0, 0), (4, 4) ist die Kovarianz
+$\frac 1 2 (2 \cdot 2 + 2 \cdot 2) = 4$.
+%% Cell type:code id:0029-402124921fd7c721bfaa138f25c8d143eb13fe2aa271ad7473b7126b0ab tags:
+``` 
+x = np.array([0, 4])
+y = np.array([0, 4])
+```
+%% Cell type:markdown id:0030-c36e6e0be7ad3f1dec0afd72fde3a265396ca90bf93c258dbd4e3346cbd tags:
+### Tests
+%% Cell type:code id:0031-3ebb3905554befd6e9c90a8161f00000757c58b10e4f248e60f7e1abf9e tags:
+``` 
+cov(x, y)
+```
+%% Output
+    4.0
--- a/08-korrelation-und-dimensionsreduktion/01-kovarianz.ipynb
+++ b/08-korrelation-und-dimensionsreduktion/01-kovarianz.ipynb
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Kovarianz\n",
+    "\n",
+    "## Berechnen\n",
+    "\n",
+    "Schreiben Sie eine Funktion `cov`, die zwei Arrays `x` und `y` erhält\n",
+    "und die Kovarianz zurück gibt."
+   ],
+   "id": "0002-0913d59b142f97e01b1f1d202696d848ab5b6e755f3d92df06ee693365f"
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "style": "python"
+   },
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "import pandas as pd"
+   ],
+   "id": "0003-a69527b239a033f8134e8f47e334b1f9486b9076b43570d3fc203164ed0"
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Tests\n",
+    "\n",
+    "Testen Sie die Funktion mit den Beispielen aus den Folien:"
+   ],
+   "id": "0005-1839d17e4e89b5f286b8ba411595b6eadeed6664573c0cae88cf6869b85"
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "output_type": "stream",
+     "name": "stdout",
+     "text": [
+      "54.875"
+     ]
+    }
+   ],
+   "source": [
+    "df1 = pd.DataFrame({'x': [10, 7, 5, -13, -8, -3, -1, 11], 'y': [8, 10, 3, -9, -1, -7, -1, 13]})\n",
+    "df2 = df1 * [1, -1]\n",
+    "df3 = pd.DataFrame({'x': [14, 10, 4, 2, -1, -1, -9, -11], 'y': [3, 0, 10, -9, 13, -6, 7, -2]})\n",
+    "cov(df1.x, df1.y)"
+   ],
+   "id": "0006-5a9afeeb069c9adcfd985dfa866647c55d0b7afa511f1ab88ce0e3feda9"
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "output_type": "stream",
+     "name": "stdout",
+     "text": [
+      "-54.875"
+     ]
+    }
+   ],
+   "source": [
+    "cov(df2.x, df2.y)"
+   ],
+   "id": "0007-bcaedf82ed0c414209d4e126ee2fd2013a7f1cd96fb948355bf543c2ce7"
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "output_type": "stream",
+     "name": "stdout",
+     "text": [
+      "0.0"
+     ]
+    }
+   ],
+   "source": [
+    "cov(df3.x, df3.y)"
+   ],
+   "id": "0008-5530793defbb6ce07fafad4901e5575defa5e0894f0d8b8d206ff56113e"
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Erzeugen\n",
+    "\n",
+    "So weit, so gut. Nun drehen wir die Aufgabenstellung um. Versuchen Sie\n",
+    "nun drei Datensätze aus jeweils zwei Samples zu erzeugen:\n",
+    "\n",
+    "**Datensatz 1** soll als Kovarianz 1 besitzen."
+   ],
+   "id": "0011-5e6f7ee8c09fb3163c75c37ab167a849116ef4505076b2442a9a3e7eb1b"
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "x = np.array([])\n",
+    "y = np.array([])"
+   ],
+   "id": "0012-1b4ea8eec5f19241e7602ee09cf19927687efc9d38c7d5360c417d2d3ba"
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Tests"
+   ],
+   "id": "0013-04ac3f4882b3309363456c952768f31a2c708e671a5462093ff68e76a76"
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "output_type": "stream",
+     "name": "stdout",
+     "text": [
+      "1.0"
+     ]
+    }
+   ],
+   "source": [
+    "cov(x, y)"
+   ],
+   "id": "0014-3ebb3905554befd6e9c90a8161f00000757c58b10e4f248e60f7e1abf9e"
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**Datensatz 2** soll auch als Kovarianz 1 besitzen, aber mit anderen\n",
+    "Samples als zuvor."
+   ],
+   "id": "0015-3dbc650c0d3e7b7d38c1256a7a2dc71a13e88a5206c4e416af544e2b775"
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "x = np.array([])\n",
+    "y = np.array([])"
+   ],
+   "id": "0016-1b4ea8eec5f19241e7602ee09cf19927687efc9d38c7d5360c417d2d3ba"
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Tests"
+   ],
+   "id": "0017-503fa0c97971fa858f825b273ceb998780354d5bb183c47acfd4f3507cf"
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "output_type": "stream",
+     "name": "stdout",
+     "text": [
+      "1.0"
+     ]
+    }
+   ],
+   "source": [
+    "cov(x, y)"
+   ],
+   "id": "0018-3ebb3905554befd6e9c90a8161f00000757c58b10e4f248e60f7e1abf9e"
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**Datensatz 3** soll als Kovarianz 4 besitzen."
+   ],
+   "id": "0019-257f544a893b080f910b0568abdb73110977708806fc07ff2bca207ef7e"
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "x = np.array([])\n",
+    "y = np.array([])"
+   ],
+   "id": "0020-1b4ea8eec5f19241e7602ee09cf19927687efc9d38c7d5360c417d2d3ba"
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Tests"
+   ],
+   "id": "0021-c36e6e0be7ad3f1dec0afd72fde3a265396ca90bf93c258dbd4e3346cbd"
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "output_type": "stream",
+     "name": "stdout",
+     "text": [
+      "4.0"
+     ]
+    }
+   ],
+   "source": [
+    "cov(x, y)"
+   ],
+   "id": "0022-3ebb3905554befd6e9c90a8161f00000757c58b10e4f248e60f7e1abf9e"
+  }
+ ],
+ "nbformat": 4,
+ "nbformat_minor": 5,
+ "metadata": {}
+}
+%% Cell type:markdown id:0002-0913d59b142f97e01b1f1d202696d848ab5b6e755f3d92df06ee693365f tags:
+# Kovarianz
+## Berechnen
+Schreiben Sie eine Funktion `cov`, die zwei Arrays `x` und `y` erhält
+und die Kovarianz zurück gibt.
+%% Cell type:code id:0003-a69527b239a033f8134e8f47e334b1f9486b9076b43570d3fc203164ed0 tags:
+``` 
+import numpy as np
+import pandas as pd
+```
+%% Cell type:markdown id:0005-1839d17e4e89b5f286b8ba411595b6eadeed6664573c0cae88cf6869b85 tags:
+### Tests
+Testen Sie die Funktion mit den Beispielen aus den Folien:
+%% Cell type:code id:0006-5a9afeeb069c9adcfd985dfa866647c55d0b7afa511f1ab88ce0e3feda9 tags:
+``` 
+df1 = pd.DataFrame({'x': [10, 7, 5, -13, -8, -3, -1, 11], 'y': [8, 10, 3, -9, -1, -7, -1, 13]})
+df2 = df1 * [1, -1]
+df3 = pd.DataFrame({'x': [14, 10, 4, 2, -1, -1, -9, -11], 'y': [3, 0, 10, -9, 13, -6, 7, -2]})
+cov(df1.x, df1.y)
+```
+%% Output
+    54.875
+%% Cell type:code id:0007-bcaedf82ed0c414209d4e126ee2fd2013a7f1cd96fb948355bf543c2ce7 tags:
+``` 
+cov(df2.x, df2.y)
+```
+%% Output
+    -54.875
+%% Cell type:code id:0008-5530793defbb6ce07fafad4901e5575defa5e0894f0d8b8d206ff56113e tags:
+``` 
+cov(df3.x, df3.y)
+```
+%% Output
+    0.0
+%% Cell type:markdown id:0011-5e6f7ee8c09fb3163c75c37ab167a849116ef4505076b2442a9a3e7eb1b tags:
+## Erzeugen
+So weit, so gut. Nun drehen wir die Aufgabenstellung um. Versuchen Sie
+nun drei Datensätze aus jeweils zwei Samples zu erzeugen:
+**Datensatz 1** soll als Kovarianz 1 besitzen.
+%% Cell type:code id:0012-1b4ea8eec5f19241e7602ee09cf19927687efc9d38c7d5360c417d2d3ba tags:
+``` 
+x = np.array([])
+y = np.array([])
+```
+%% Cell type:markdown id:0013-04ac3f4882b3309363456c952768f31a2c708e671a5462093ff68e76a76 tags:
+### Tests
+%% Cell type:code id:0014-3ebb3905554befd6e9c90a8161f00000757c58b10e4f248e60f7e1abf9e tags:
+``` 
+cov(x, y)
+```
+%% Output
+    1.0
+%% Cell type:markdown id:0015-3dbc650c0d3e7b7d38c1256a7a2dc71a13e88a5206c4e416af544e2b775 tags:
+**Datensatz 2** soll auch als Kovarianz 1 besitzen, aber mit anderen
+Samples als zuvor.
+%% Cell type:code id:0016-1b4ea8eec5f19241e7602ee09cf19927687efc9d38c7d5360c417d2d3ba tags:
+``` 
+x = np.array([])
+y = np.array([])
+```
+%% Cell type:markdown id:0017-503fa0c97971fa858f825b273ceb998780354d5bb183c47acfd4f3507cf tags:
+### Tests
+%% Cell type:code id:0018-3ebb3905554befd6e9c90a8161f00000757c58b10e4f248e60f7e1abf9e tags:
+``` 
+cov(x, y)
+```
+%% Output
+    1.0
+%% Cell type:markdown id:0019-257f544a893b080f910b0568abdb73110977708806fc07ff2bca207ef7e tags:
+**Datensatz 3** soll als Kovarianz 4 besitzen.
+%% Cell type:code id:0020-1b4ea8eec5f19241e7602ee09cf19927687efc9d38c7d5360c417d2d3ba tags:
+``` 
+x = np.array([])
+y = np.array([])
+```
+%% Cell type:markdown id:0021-c36e6e0be7ad3f1dec0afd72fde3a265396ca90bf93c258dbd4e3346cbd tags:
+### Tests
+%% Cell type:code id:0022-3ebb3905554befd6e9c90a8161f00000757c58b10e4f248e60f7e1abf9e tags:
+``` 
+cov(x, y)
+```
+%% Output
+    4.0
--- a/08-korrelation-und-dimensionsreduktion/02-datasaurus-sol.ipynb
+++ b/08-korrelation-und-dimensionsreduktion/02-datasaurus-sol.ipynb
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Datasaurus\n",
+    "\n",
+    "Laden Sie die Daten `datasaurus.csv` und betrachten für beliebige\n",
+    "Datensätze statistische Werte wie\n",
+    "\n",
+    "-   Mittelwerte von x und y\n",
+    "-   Standardabweichungen von x und y\n",
+    "-   Korrelationskoeffizient zwischen x und y\n",
+    "\n",
+    "Was schließen Sie daraus? Was könnten Sie noch machen um ein\n",
+    "Datenverständnis aufzubauen?\n",
+    "\n",
+    "## Lösung\n",
+    "\n",
+    "Wir laden zunächst die Daten."
+   ],
+   "id": "0005-26afc5b4704584feaccf0e55e8a571368fb090c84b1a48d539857d405c9"
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "style": "python"
+   },
+   "outputs": [],
+   "source": [
+    "import pandas as pd\n",
+    "df = pd.read_csv('datasaurus.csv')"
+   ],
+   "id": "0006-ec8063a047adadc262ed2fa6a02a175adb276b5352581f5234aae966f7b"
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Dann geben wir die Daten mal grob aus:"
+   ],
+   "id": "0007-7706bb8a9163533b90003de7af396b817b9bfd92e413dedc6f55a04d24f"
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "output_type": "stream",
+     "name": "stdout",
+     "text": [
+      "         dataset          x          y\n",
+      "0           dino  55.384600  97.179500\n",
+      "1           dino  51.538500  96.025600\n",
+      "2           dino  46.153800  94.487200\n",
+      "3           dino  42.820500  91.410300\n",
+      "4           dino  40.769200  88.333300\n",
+      "...          ...        ...        ...\n",
+      "1841  wide_lines  33.674442  26.090490\n",
+      "1842  wide_lines  75.627255  37.128752\n",
+      "1843  wide_lines  40.610125  89.136240\n",
+      "1844  wide_lines  39.114366  96.481751\n",
+      "1845  wide_lines  34.583829  89.588902\n",
+      "\n",
+      "[1846 rows x 3 columns]"
+     ]
+    }
+   ],
+   "source": [
+    "df"
+   ],
+   "id": "0008-0482c68413fbf8290e3b1e49b0a85901cfcd62ab0738760568a2a6e8a57"
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Aha, die Datensätze sind also über die Spalte `dataset` getrennt. Das\n",
+    "ist Praktisch, denn damit können wir die Datensätze gruppieren und für\n",
+    "alle den Mittelwert ausgeben:"
+   ],
+   "id": "0009-766eb695b9c56d31ef0e81db1ca96663c9fd98e3e46dd1b6cace938f9ed"
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "output_type": "stream",
+     "name": "stdout",
+     "text": [
+      "                    x          y\n",
+      "dataset\n",
+      "away        54.266100  47.834721\n",
+      "bullseye    54.268730  47.830823\n",
+      "circle      54.267320  47.837717\n",
+      "dino        54.263273  47.832253\n",
+      "dots        54.260303  47.839829\n",
+      "h_lines     54.261442  47.830252\n",
+      "high_lines  54.268805  47.835450\n",
+      "slant_down  54.267849  47.835896\n",
+      "slant_up    54.265882  47.831496\n",
+      "star        54.267341  47.839545\n",
+      "v_lines     54.269927  47.836988\n",
+      "wide_lines  54.266916  47.831602\n",
+      "x_shape     54.260150  47.839717"
+     ]
+    }
+   ],
+   "source": [
+    "df.groupby('dataset').mean()"
+   ],
+   "id": "0010-b342ee9c963b9fa49a7bea82eefca4c86be18f50947908236381dfc1ad0"
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Die Mittelwerte aller Datensätze sind jeweils für x und y nahezu gleich.\n",
+    "Und die Standardabweichungen auch:"
+   ],
+   "id": "0011-5c2541a1ac168b4459987d9491310dcd4abe498e8983f80c18f67a82eac"
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "output_type": "stream",
+     "name": "stdout",
+     "text": [
+      "                    x          y\n",
+      "dataset\n",
+      "away        16.769825  26.939743\n",
+      "bullseye    16.769239  26.935727\n",
+      "circle      16.760013  26.930036\n",
+      "dino        16.765142  26.935403\n",
+      "dots        16.767735  26.930192\n",
+      "h_lines     16.765898  26.939876\n",
+      "high_lines  16.766704  26.939998\n",
+      "slant_down  16.766759  26.936105\n",
+      "slant_up    16.768853  26.938608\n",
+      "star        16.768959  26.930275\n",
+      "v_lines     16.769959  26.937684\n",
+      "wide_lines  16.770000  26.937902\n",
+      "x_shape     16.769958  26.930002"
+     ]
+    }
+   ],
+   "source": [
+    "df.groupby('dataset').std()"
+   ],
+   "id": "0012-a437abd5ed49b4b3e4e43675df457ff411477c73c3c4aec6d3308b89356"
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Was ist mit den Korrelationen zwischen x und y? Der Code ist\n",
+    "kompliziert, weil hier ein Multi-Index ensteht (jeder Dataset enthält\n",
+    "eine Korrelationsmatrix) und wir nur einen Wert davon haben wollen:"
+   ],
+   "id": "0013-d181c960ab516f871d98eadc6493da90b9b4068bccda237e749200efc04"
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "output_type": "stream",
+     "name": "stdout",
+     "text": [
+      "dataset\n",
+      "away         -0.064128\n",
+      "bullseye     -0.068586\n",
+      "circle       -0.068343\n",
+      "dino         -0.064472\n",
+      "dots         -0.060341\n",
+      "h_lines      -0.061715\n",
+      "high_lines   -0.068504\n",
+      "slant_down   -0.068980\n",
+      "slant_up     -0.068609\n",
+      "star         -0.062961\n",
+      "v_lines      -0.069446\n",
+      "wide_lines   -0.066575\n",
+      "x_shape      -0.065583\n",
+      "Name: x, dtype: float64"
+     ]
+    }
+   ],
+   "source": [
+    "df.groupby('dataset').corr()['x'][:, 'y']"
+   ],
+   "id": "0014-8236dab4db77881177f2961943ff03473ddd7fe7c652ac66baea2a09a01"
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Also auch die Korrelationswerte sind fast gleich und zwar nahe 0. Hmm…\n",
+    "wir könnten jetzt z. B. den Median anschauen. Der wäre nicht gleich,\n",
+    "aber daraus verstehen wir auch nicht so richtig, was hier los ist.\n",
+    "Scatter-Plots to the rescue!"
+   ],
+   "id": "0015-72d799f9b4a7863ce5a44a0cf48b7da4bce08752c77b2f27f07afecc8be"
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "style": "python"
+   },
+   "outputs": [],
+   "source": [
+    "df.groupby('dataset').plot.scatter('x', 'y')"
+   ],
+   "id": "0016-cb4f9080bda8d31054a17b051085cdb7f6f96dc5c873616190c59420579"
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Für einen einzelnen Plot können wir natürlich auch nach dem Dataset\n",
+    "filtern:"
+   ],
+   "id": "0017-9bb4b1a268443f12295eac8e102fc5006bcba977410502f17341a4cec57"
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "style": "python"
+   },
+   "outputs": [],
+   "source": [
+    "df.loc[df.dataset == 'dino', ['x', 'y']].plot.scatter('x', 'y')"
+   ],
+   "id": "0018-33dd16c34b351bd44e01483365f9c25c16717a4927e3e5f8a46d66ab17a"
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Schlussfolgerung: Visualisierung von Daten ist wichtig, aber meistens\n",
+    "nicht so einfach wie hier. Dimensionsreduktionstechniken können dabei\n",
+    "behilflich sein. Statistiken sind nützlich, aber reichen nicht aus um\n",
+    "ein hinreichendes Datenverständnis zu erwerben."
+   ],
+   "id": "0019-9306a8b6383c69b603008b91eefd53d0cbff5ba07be0cdf3b471261b165"
+  }
+ ],
+ "nbformat": 4,
+ "nbformat_minor": 5,
+ "metadata": {}
+}
+%% Cell type:markdown id:0005-26afc5b4704584feaccf0e55e8a571368fb090c84b1a48d539857d405c9 tags:
+# Datasaurus
+Laden Sie die Daten `datasaurus.csv` und betrachten für beliebige
+Datensätze statistische Werte wie
+-   Mittelwerte von x und y
+-   Standardabweichungen von x und y
+-   Korrelationskoeffizient zwischen x und y
+Was schließen Sie daraus? Was könnten Sie noch machen um ein
+Datenverständnis aufzubauen?
+## Lösung
+Wir laden zunächst die Daten.
+%% Cell type:code id:0006-ec8063a047adadc262ed2fa6a02a175adb276b5352581f5234aae966f7b tags:
+``` 
+import pandas as pd
+df = pd.read_csv('datasaurus.csv')
+```
+%% Cell type:markdown id:0007-7706bb8a9163533b90003de7af396b817b9bfd92e413dedc6f55a04d24f tags:
+Dann geben wir die Daten mal grob aus:
+%% Cell type:code id:0008-0482c68413fbf8290e3b1e49b0a85901cfcd62ab0738760568a2a6e8a57 tags:
+``` 
+df
+```
+%% Output
+             dataset          x          y
+    0           dino  55.384600  97.179500
+    1           dino  51.538500  96.025600
+    2           dino  46.153800  94.487200
+    3           dino  42.820500  91.410300
+    4           dino  40.769200  88.333300
+    ...          ...        ...        ...
+    1841  wide_lines  33.674442  26.090490
+    1842  wide_lines  75.627255  37.128752
+    1843  wide_lines  40.610125  89.136240
+    1844  wide_lines  39.114366  96.481751
+    1845  wide_lines  34.583829  89.588902
+    [1846 rows x 3 columns]
+%% Cell type:markdown id:0009-766eb695b9c56d31ef0e81db1ca96663c9fd98e3e46dd1b6cace938f9ed tags:
+Aha, die Datensätze sind also über die Spalte `dataset` getrennt. Das
+ist Praktisch, denn damit können wir die Datensätze gruppieren und für
+alle den Mittelwert ausgeben:
+%% Cell type:code id:0010-b342ee9c963b9fa49a7bea82eefca4c86be18f50947908236381dfc1ad0 tags:
+``` 
+df.groupby('dataset').mean()
+```
+%% Output
+                        x          y
+    dataset
+    away        54.266100  47.834721
+    bullseye    54.268730  47.830823
+    circle      54.267320  47.837717
+    dino        54.263273  47.832253
+    dots        54.260303  47.839829
+    h_lines     54.261442  47.830252
+    high_lines  54.268805  47.835450
+    slant_down  54.267849  47.835896
+    slant_up    54.265882  47.831496
+    star        54.267341  47.839545
+    v_lines     54.269927  47.836988
+    wide_lines  54.266916  47.831602
+    x_shape     54.260150  47.839717
+%% Cell type:markdown id:0011-5c2541a1ac168b4459987d9491310dcd4abe498e8983f80c18f67a82eac tags:
+Die Mittelwerte aller Datensätze sind jeweils für x und y nahezu gleich.
+Und die Standardabweichungen auch:
+%% Cell type:code id:0012-a437abd5ed49b4b3e4e43675df457ff411477c73c3c4aec6d3308b89356 tags:
+``` 
+df.groupby('dataset').std()
+```
+%% Output
+                        x          y
+    dataset
+    away        16.769825  26.939743
+    bullseye    16.769239  26.935727
+    circle      16.760013  26.930036
+    dino        16.765142  26.935403
+    dots        16.767735  26.930192
+    h_lines     16.765898  26.939876
+    high_lines  16.766704  26.939998
+    slant_down  16.766759  26.936105
+    slant_up    16.768853  26.938608
+    star        16.768959  26.930275
+    v_lines     16.769959  26.937684
+    wide_lines  16.770000  26.937902
+    x_shape     16.769958  26.930002
+%% Cell type:markdown id:0013-d181c960ab516f871d98eadc6493da90b9b4068bccda237e749200efc04 tags:
+Was ist mit den Korrelationen zwischen x und y? Der Code ist
+kompliziert, weil hier ein Multi-Index ensteht (jeder Dataset enthält
+eine Korrelationsmatrix) und wir nur einen Wert davon haben wollen:
+%% Cell type:code id:0014-8236dab4db77881177f2961943ff03473ddd7fe7c652ac66baea2a09a01 tags:
+``` 
+df.groupby('dataset').corr()['x'][:, 'y']
+```
+%% Output
+    dataset
+    away         -0.064128
+    bullseye     -0.068586
+    circle       -0.068343
+    dino         -0.064472
+    dots         -0.060341
+    h_lines      -0.061715
+    high_lines   -0.068504
+    slant_down   -0.068980
+    slant_up     -0.068609
+    star         -0.062961
+    v_lines      -0.069446
+    wide_lines   -0.066575
+    x_shape      -0.065583
+    Name: x, dtype: float64
+%% Cell type:markdown id:0015-72d799f9b4a7863ce5a44a0cf48b7da4bce08752c77b2f27f07afecc8be tags:
+Also auch die Korrelationswerte sind fast gleich und zwar nahe 0. Hmm…
+wir könnten jetzt z. B. den Median anschauen. Der wäre nicht gleich,
+aber daraus verstehen wir auch nicht so richtig, was hier los ist.
+Scatter-Plots to the rescue!
+%% Cell type:code id:0016-cb4f9080bda8d31054a17b051085cdb7f6f96dc5c873616190c59420579 tags:
+``` 
+df.groupby('dataset').plot.scatter('x', 'y')
+```
+%% Cell type:markdown id:0017-9bb4b1a268443f12295eac8e102fc5006bcba977410502f17341a4cec57 tags:
+Für einen einzelnen Plot können wir natürlich auch nach dem Dataset
+filtern:
+%% Cell type:code id:0018-33dd16c34b351bd44e01483365f9c25c16717a4927e3e5f8a46d66ab17a tags:
+``` 
+df.loc[df.dataset == 'dino', ['x', 'y']].plot.scatter('x', 'y')
+```
+%% Cell type:markdown id:0019-9306a8b6383c69b603008b91eefd53d0cbff5ba07be0cdf3b471261b165 tags:
+Schlussfolgerung: Visualisierung von Daten ist wichtig, aber meistens
+nicht so einfach wie hier. Dimensionsreduktionstechniken können dabei
+behilflich sein. Statistiken sind nützlich, aber reichen nicht aus um
+ein hinreichendes Datenverständnis zu erwerben.
--- a/08-korrelation-und-dimensionsreduktion/02-datasaurus.ipynb
+++ b/08-korrelation-und-dimensionsreduktion/02-datasaurus.ipynb
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Datasaurus\n",
+    "\n",
+    "Laden Sie die Daten `datasaurus.csv` und betrachten für beliebige\n",
+    "Datensätze statistische Werte wie\n",
+    "\n",
+    "-   Mittelwerte von x und y\n",
+    "-   Standardabweichungen von x und y\n",
+    "-   Korrelationskoeffizient zwischen x und y\n",
+    "\n",
+    "Was schließen Sie daraus? Was könnten Sie noch machen um ein\n",
+    "Datenverständnis aufzubauen?\n",
+    "\n",
+    "Hier Ihr Code:"
+   ],
+   "id": "0004-acee268a50d14b526dc75bb1f3532efe74d442564a65304d69624cfecef"
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "style": "python"
+   },
+   "outputs": [],
+   "source": [],
+   "id": "0005-44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"
+  }
+ ],
+ "nbformat": 4,
+ "nbformat_minor": 5,
+ "metadata": {}
+}
+%% Cell type:markdown id:0004-acee268a50d14b526dc75bb1f3532efe74d442564a65304d69624cfecef tags:
+# Datasaurus
+Laden Sie die Daten `datasaurus.csv` und betrachten für beliebige
+Datensätze statistische Werte wie
+-   Mittelwerte von x und y
+-   Standardabweichungen von x und y
+-   Korrelationskoeffizient zwischen x und y
+Was schließen Sie daraus? Was könnten Sie noch machen um ein
+Datenverständnis aufzubauen?
+Hier Ihr Code:
+%% Cell type:code id:0005-44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 tags:
+``` 
+```
--- a/08-korrelation-und-dimensionsreduktion/03-anscombe.ipynb
+++ b/08-korrelation-und-dimensionsreduktion/03-anscombe.ipynb
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Anscombe’s Datasets\n",
+    "\n",
+    "Hier ist der Code um die Statistiken und die Plots aus den Folien zu\n",
+    "erzeugen. Falls Sie möchten, können Sie damit herumspielen."
+   ],
+   "id": "0001-47ec560fa976f47180d8889c9b67b1031223795531ff28c56b723a977ee"
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "style": "python"
+   },
+   "outputs": [],
+   "source": [
+    "import matplotlib.pyplot as plt\n",
+    "from numpy.polynomial import Polynomial\n",
+    "import numpy as np\n",
+    "import pandas as pd\n",
+    "import seaborn as sns\n",
+    "\n",
+    "df = sns.load_dataset(\"anscombe\")"
+   ],
+   "id": "0002-bbd16bd579840f98abcf3f0f6b704f69373b858aa9b36da508d149f1adf"
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "style": "python"
+   },
+   "outputs": [],
+   "source": [
+    "for ds_name, ds in df.groupby('dataset')[['x', 'y']]:\n",
+    "    mean = ds.mean()\n",
+    "    var = ds.var()\n",
+    "    c = ds.corr()\n",
+    "    poly = Polynomial.fit(ds.x, ds.y, deg=1).convert()\n",
+    "    with np.printoptions(precision=2):\n",
+    "        print(f'DS {ds_name:3}: mean (x: {mean.x:.2f}, y: {mean.y:.2f}), var (x: {var.x:.2f}, y: {var.y:.2f}), corr: {c.x.y:.3f}, linear model regression: {poly}')\n",
+    "\n",
+    "sns.lmplot(x=\"x\", y=\"y\", col=\"dataset\", hue=\"dataset\", data=df,\n",
+    "           col_wrap=2, ci=None, palette=\"muted\", height=5,\n",
+    "           line_kws={'color': 'lightgrey'}, scatter_kws={\"s\": 100})\n",
+    "\n",
+    "fig = plt.gcf()\n",
+    "fig.set_size_inches(6.5, 4)\n",
+    "fig.patch.set_alpha(0)\n",
+    "fig.tight_layout()\n",
+    "fig.savefig('anscombe.pdf', pad_inches=0)"
+   ],
+   "id": "0003-b4658e20442bb236c82c49dc2764e6ce9c93c82fe6eb93a5e79ab822189"
+  }
+ ],
+ "nbformat": 4,
+ "nbformat_minor": 5,
+ "metadata": {}
+}
+%% Cell type:markdown id:0001-47ec560fa976f47180d8889c9b67b1031223795531ff28c56b723a977ee tags:
+# Anscombe’s Datasets
+Hier ist der Code um die Statistiken und die Plots aus den Folien zu
+erzeugen. Falls Sie möchten, können Sie damit herumspielen.
+%% Cell type:code id:0002-bbd16bd579840f98abcf3f0f6b704f69373b858aa9b36da508d149f1adf tags:
+``` 
+import matplotlib.pyplot as plt
+from numpy.polynomial import Polynomial
+import numpy as np
+import pandas as pd
+import seaborn as sns
+df = sns.load_dataset("anscombe")
+```
+%% Cell type:code id:0003-b4658e20442bb236c82c49dc2764e6ce9c93c82fe6eb93a5e79ab822189 tags:
+``` 
+for ds_name, ds in df.groupby('dataset')[['x', 'y']]:
+    mean = ds.mean()
+    var = ds.var()
+    c = ds.corr()
+    poly = Polynomial.fit(ds.x, ds.y, deg=1).convert()
+    with np.printoptions(precision=2):
+        print(f'DS {ds_name:3}: mean (x: {mean.x:.2f}, y: {mean.y:.2f}), var (x: {var.x:.2f}, y: {var.y:.2f}), corr: {c.x.y:.3f}, linear model regression: {poly}')
+sns.lmplot(x="x", y="y", col="dataset", hue="dataset", data=df,
+           col_wrap=2, ci=None, palette="muted", height=5,
+           line_kws={'color': 'lightgrey'}, scatter_kws={"s": 100})
+fig = plt.gcf()
+fig.set_size_inches(6.5, 4)
+fig.patch.set_alpha(0)
+fig.tight_layout()
+fig.savefig('anscombe.pdf', pad_inches=0)
+```
--- a/08-korrelation-und-dimensionsreduktion/04-feature-map-sol.ipynb
+++ b/08-korrelation-und-dimensionsreduktion/04-feature-map-sol.ipynb
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Feature-Map\n",
+    "\n",
+    "In dieser Aufgabe wollen wir die Features entsprechend der\n",
+    "Korrelationsmatrix auf einer Karte plotten.\n",
+    "\n",
+    "1.  Laden Sie die Autodaten aus `autos.csv` als DataFrame.\n",
+    "2.  *Bonus: Werfen Sie die Ausreißer raus. Was hat das für Auswirkungen\n",
+    "    auf das Ergebnis.*\n",
+    "3.  Berechnen Sie die Korrelationsmatrix.\n",
+    "4.  Wandeln Sie die Korrelationsmatrix $P$ in eine Distanzmatrix\n",
+    "    $D = \\sqrt{1 - P}$ um. \\*Bonus: Probieren Sie auch\n",
+    "    $D = \\sqrt{1 - |P|}$\n",
+    "5.  Finden Sie mit MDS die Koordinaten zu den Features. Sie benötigen\n",
+    "    `dissimilarity='precomputed'`, damit Sie in `fit` $D$ reingeben\n",
+    "    können.\n",
+    "6.  Plotten Sie das Ergebnis mittels Plotly Express’ Scatter-Plot, denn\n",
+    "    da können Sie an das Argument `text` die Feature-Namen übergeben."
+   ],
+   "id": "0002-8585dc9b0ed930d72556df47900a3e3dae65ea2bd7f244d5822c3bb4206"
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "style": "python"
+   },
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "import pandas as pd\n",
+    "import plotly.express as px\n",
+    "from sklearn.manifold import MDS"
+   ],
+   "id": "0003-c1bb0a9ce1897e013bbc5224cd3031da808967b4ce5f467e752db79b3b6"
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Lösung\n",
+    "\n",
+    "Hier der Code zur Lösung:"
+   ],
+   "id": "0005-2b2e02f7c099c0b3c2e7ee38e724334b181f374b2f6da5066b33d7489c5"
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "style": "python"
+   },
+   "outputs": [],
+   "source": [
+    "df = pd.read_csv('autos.csv').drop(columns=['Marke', 'Modell'])\n",
+    "# df = df[df.Grundpreis < 150_000]  # mit Ausreißern ist der Grundpreis weit weg von den Motordaten\n",
+    "\n",
+    "corr = df.corr()\n",
+    "\n",
+    "dist_corr = np.sqrt(1 - corr)\n",
+    "# dist_corr = np.sqrt(1 - np.abs(corr))  # mit abs rückt die Türanzahl näher an alle anderes\n",
+    "\n",
+    "mds = MDS(dissimilarity='precomputed', normalized_stress='auto')\n",
+    "corr_map = mds.fit_transform(dist_corr)\n",
+    "corr_map = pd.DataFrame(corr_map)\n",
+    "corr_map['feature'] = corr.columns"
+   ],
+   "id": "0006-472ff22b9cdec2be85fd14f451bb4cdea7db8ee3cbf28c128c994cf0453"
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Der Grundpreis ist ohne Ausreißer näher an den Motordaten, d.h. der\n",
+    "Preis verhält sich ähnlich. Mit Ausreißer ist der Preis weit weg. Das\n",
+    "lässt sich so interpretieren, dass der Preis für sehr teure Autos nicht\n",
+    "mehr im Verhältnis zum Motor steht."
+   ],
+   "id": "0007-96c5b071fc624449a3bff00f5acf449ff13a3986090c619ea3c40dbebf7"
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "px.scatter(corr_map, x=0, y=1, text=corr.columns)"
+   ],
+   "id": "0008-20ca1c79ef727fdf527b2a98d7d6fe563ef6fd9c2b005bb1fdfb364bbbb"
+  }
+ ],
+ "nbformat": 4,
+ "nbformat_minor": 5,
+ "metadata": {}
+}
+%% Cell type:markdown id:0002-8585dc9b0ed930d72556df47900a3e3dae65ea2bd7f244d5822c3bb4206 tags:
+# Feature-Map
+In dieser Aufgabe wollen wir die Features entsprechend der
+Korrelationsmatrix auf einer Karte plotten.
+1.  Laden Sie die Autodaten aus `autos.csv` als DataFrame.
+2.  *Bonus: Werfen Sie die Ausreißer raus. Was hat das für Auswirkungen
+    auf das Ergebnis.*
+3.  Berechnen Sie die Korrelationsmatrix.
+4.  Wandeln Sie die Korrelationsmatrix $P$ in eine Distanzmatrix
+    $D = \sqrt{1 - P}$ um. \*Bonus: Probieren Sie auch
+    $D = \sqrt{1 - |P|}$
+5.  Finden Sie mit MDS die Koordinaten zu den Features. Sie benötigen
+    `dissimilarity='precomputed'`, damit Sie in `fit` $D$ reingeben
+    können.
+6.  Plotten Sie das Ergebnis mittels Plotly Express’ Scatter-Plot, denn
+    da können Sie an das Argument `text` die Feature-Namen übergeben.
+%% Cell type:code id:0003-c1bb0a9ce1897e013bbc5224cd3031da808967b4ce5f467e752db79b3b6 tags:
+``` 
+import numpy as np
+import pandas as pd
+import plotly.express as px
+from sklearn.manifold import MDS
+```
+%% Cell type:markdown id:0005-2b2e02f7c099c0b3c2e7ee38e724334b181f374b2f6da5066b33d7489c5 tags:
+## Lösung
+Hier der Code zur Lösung:
+%% Cell type:code id:0006-472ff22b9cdec2be85fd14f451bb4cdea7db8ee3cbf28c128c994cf0453 tags:
+``` 
+df = pd.read_csv('autos.csv').drop(columns=['Marke', 'Modell'])
+# df = df[df.Grundpreis < 150_000]  # mit Ausreißern ist der Grundpreis weit weg von den Motordaten
+corr = df.corr()
+dist_corr = np.sqrt(1 - corr)
+# dist_corr = np.sqrt(1 - np.abs(corr))  # mit abs rückt die Türanzahl näher an alle anderes
+mds = MDS(dissimilarity='precomputed', normalized_stress='auto')
+corr_map = mds.fit_transform(dist_corr)
+corr_map = pd.DataFrame(corr_map)
+corr_map['feature'] = corr.columns
+```
+%% Cell type:markdown id:0007-96c5b071fc624449a3bff00f5acf449ff13a3986090c619ea3c40dbebf7 tags:
+Der Grundpreis ist ohne Ausreißer näher an den Motordaten, d.h. der
+Preis verhält sich ähnlich. Mit Ausreißer ist der Preis weit weg. Das
+lässt sich so interpretieren, dass der Preis für sehr teure Autos nicht
+mehr im Verhältnis zum Motor steht.
+%% Cell type:code id:0008-20ca1c79ef727fdf527b2a98d7d6fe563ef6fd9c2b005bb1fdfb364bbbb tags:
+``` 
+px.scatter(corr_map, x=0, y=1, text=corr.columns)
+```
--- a/08-korrelation-und-dimensionsreduktion/04-feature-map.ipynb
+++ b/08-korrelation-und-dimensionsreduktion/04-feature-map.ipynb
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Feature-Map\n",
+    "\n",
+    "In dieser Aufgabe wollen wir die Features entsprechend der\n",
+    "Korrelationsmatrix auf einer Karte plotten.\n",
+    "\n",
+    "1.  Laden Sie die Autodaten aus `autos.csv` als DataFrame.\n",
+    "2.  *Bonus: Werfen Sie die Ausreißer raus. Was hat das für Auswirkungen\n",
+    "    auf das Ergebnis.*\n",
+    "3.  Berechnen Sie die Korrelationsmatrix.\n",
+    "4.  Wandeln Sie die Korrelationsmatrix $P$ in eine Distanzmatrix\n",
+    "    $D = \\sqrt{1 - P}$ um. \\*Bonus: Probieren Sie auch\n",
+    "    $D = \\sqrt{1 - |P|}$\n",
+    "5.  Finden Sie mit MDS die Koordinaten zu den Features. Sie benötigen\n",
+    "    `dissimilarity='precomputed'`, damit Sie in `fit` $D$ reingeben\n",
+    "    können.\n",
+    "6.  Plotten Sie das Ergebnis mittels Plotly Express’ Scatter-Plot, denn\n",
+    "    da können Sie an das Argument `text` die Feature-Namen übergeben."
+   ],
+   "id": "0002-8585dc9b0ed930d72556df47900a3e3dae65ea2bd7f244d5822c3bb4206"
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "style": "python"
+   },
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "import pandas as pd\n",
+    "import plotly.express as px\n",
+    "from sklearn.manifold import MDS"
+   ],
+   "id": "0003-c1bb0a9ce1897e013bbc5224cd3031da808967b4ce5f467e752db79b3b6"
+  }
+ ],
+ "nbformat": 4,
+ "nbformat_minor": 5,
+ "metadata": {}
+}
+%% Cell type:markdown id:0002-8585dc9b0ed930d72556df47900a3e3dae65ea2bd7f244d5822c3bb4206 tags:
+# Feature-Map
+In dieser Aufgabe wollen wir die Features entsprechend der
+Korrelationsmatrix auf einer Karte plotten.
+1.  Laden Sie die Autodaten aus `autos.csv` als DataFrame.
+2.  *Bonus: Werfen Sie die Ausreißer raus. Was hat das für Auswirkungen
+    auf das Ergebnis.*
+3.  Berechnen Sie die Korrelationsmatrix.
+4.  Wandeln Sie die Korrelationsmatrix $P$ in eine Distanzmatrix
+    $D = \sqrt{1 - P}$ um. \*Bonus: Probieren Sie auch
+    $D = \sqrt{1 - |P|}$
+5.  Finden Sie mit MDS die Koordinaten zu den Features. Sie benötigen
+    `dissimilarity='precomputed'`, damit Sie in `fit` $D$ reingeben
+    können.
+6.  Plotten Sie das Ergebnis mittels Plotly Express’ Scatter-Plot, denn
+    da können Sie an das Argument `text` die Feature-Namen übergeben.
+%% Cell type:code id:0003-c1bb0a9ce1897e013bbc5224cd3031da808967b4ce5f467e752db79b3b6 tags:
+``` 
+import numpy as np
+import pandas as pd
+import plotly.express as px
+from sklearn.manifold import MDS
+```
--- a/08-korrelation-und-dimensionsreduktion/05-reduce-mnist-sol.ipynb
+++ b/08-korrelation-und-dimensionsreduktion/05-reduce-mnist-sol.ipynb
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# MNIST visualisieren\n",
+    "\n",
+    "In dieser Aufgabe wollen wir einen hochdimensionalen Datensatz in 2D\n",
+    "(oder 3D) plotten. Die Daten werden schon geladen."
+   ],
+   "id": "0001-9c88d904212173177e3cd805c405ca6bffb6c39ce47a88ff66351aa9536"
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "style": "python"
+   },
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "import plotly.express as px\n",
+    "from tensorflow.keras.datasets.mnist import load_data\n",
+    "\n",
+    "(x_train, y_train), (x_test, y_test) = load_data()\n",
+    "\n",
+    "# reshape test set to 10 000 x 784\n",
+    "X = x_test.reshape(-1, 28 * 28)"
+   ],
+   "id": "0002-f9e5f87e267af56fbd5014863286132d368c2876d241287b849ceff60cc"
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Wenn Sie mögen ist hier ein Plot von verschiedenen Bildern der gleichen\n",
+    "Klasse. So können Sie ein Blick reinwerfen."
+   ],
+   "id": "0003-fef1281e3edb72c906210479c5811c47fe2289914626b2c40534881280c"
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "style": "python"
+   },
+   "outputs": [],
+   "source": [
+    "# plot 50 examples for each digit\n",
+    "imgs = np.empty((50, 10, 28, 28))\n",
+    "for j in range(10):\n",
+    "    imgs[:, j] = x_test[y_test == j][:50]\n",
+    "\n",
+    "fig = px.imshow(imgs, animation_frame=0, facet_col=1, facet_col_wrap=5, binary_string=True)\n",
+    "fig.update_xaxes(showticklabels=False)\n",
+    "fig.update_yaxes(showticklabels=False)\n",
+    "fig.show()"
+   ],
+   "id": "0004-c879d58b500c83a0364d8680cc6b05c9cc97d36d3ea3743e112c56644db"
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Transformieren Sie die Daten in 2D (ode 3D) und plotten die\n",
+    "Transformierten Daten als Scatter-Plot mit `y_test` als\n",
+    "Farbunterscheidung.\n",
+    "\n",
+    "## Lösung\n",
+    "\n",
+    "Hier der Code zur Lösung:"
+   ],
+   "id": "0007-d262459c63ed3d98add57f2c48715daab6ff8a674c4e923ee87f87f61ad"
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "style": "python"
+   },
+   "outputs": [],
+   "source": [
+    "from umap import UMAP\n",
+    "from sklearn.decomposition import PCA\n",
+    "\n",
+    "pca = PCA(n_components=2)\n",
+    "X_pca = pca.fit_transform(X)\n",
+    "\n",
+    "umap = UMAP(n_neighbors=20, metric='manhattan', min_dist=0.1)\n",
+    "X_umap = umap.fit_transform(X)\n",
+    "\n",
+    "scatter1 = px.scatter(X_pca, x=0, y=1, color=y_test)\n",
+    "scatter1.show()\n",
+    "scatter2 = px.scatter(X_umap, x=0, y=1, color=y_test, hover_data={'class': y_test, 'index': np.arange(len(X_umap))})\n",
+    "scatter2.show()"
+   ],
+   "id": "0008-a04806d4df0812cf8fba0aa2e4e76d43ce0a2c8b2e2a9d7c032abc4e78f"
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Der Grundpreis ist ohne Ausreißer näher an den Motordaten, d.h. der\n",
+    "Preis verhält sich ähnlich. Mit Ausreißer ist der Preis weit weg. Das\n",
+    "lässt sich so interpretieren, dass der Preis für sehr teure Autos nicht\n",
+    "mehr im Verhältnis zum Motor steht."
+   ],
+   "id": "0009-96c5b071fc624449a3bff00f5acf449ff13a3986090c619ea3c40dbebf7"
+  }
+ ],
+ "nbformat": 4,
+ "nbformat_minor": 5,
+ "metadata": {}
+}
+%% Cell type:markdown id:0001-9c88d904212173177e3cd805c405ca6bffb6c39ce47a88ff66351aa9536 tags:
+# MNIST visualisieren
+In dieser Aufgabe wollen wir einen hochdimensionalen Datensatz in 2D
+(oder 3D) plotten. Die Daten werden schon geladen.
+%% Cell type:code id:0002-f9e5f87e267af56fbd5014863286132d368c2876d241287b849ceff60cc tags:
+``` 
+import numpy as np
+import plotly.express as px
+from tensorflow.keras.datasets.mnist import load_data
+(x_train, y_train), (x_test, y_test) = load_data()
+# reshape test set to 10 000 x 784
+X = x_test.reshape(-1, 28 * 28)
+```
+%% Cell type:markdown id:0003-fef1281e3edb72c906210479c5811c47fe2289914626b2c40534881280c tags:
+Wenn Sie mögen ist hier ein Plot von verschiedenen Bildern der gleichen
+Klasse. So können Sie ein Blick reinwerfen.
+%% Cell type:code id:0004-c879d58b500c83a0364d8680cc6b05c9cc97d36d3ea3743e112c56644db tags:
+``` 
+# plot 50 examples for each digit
+imgs = np.empty((50, 10, 28, 28))
+for j in range(10):
+    imgs[:, j] = x_test[y_test == j][:50]
+fig = px.imshow(imgs, animation_frame=0, facet_col=1, facet_col_wrap=5, binary_string=True)
+fig.update_xaxes(showticklabels=False)
+fig.update_yaxes(showticklabels=False)
+fig.show()
+```
+%% Cell type:markdown id:0007-d262459c63ed3d98add57f2c48715daab6ff8a674c4e923ee87f87f61ad tags:
+Transformieren Sie die Daten in 2D (ode 3D) und plotten die
+Transformierten Daten als Scatter-Plot mit `y_test` als
+Farbunterscheidung.
+## Lösung
+Hier der Code zur Lösung:
+%% Cell type:code id:0008-a04806d4df0812cf8fba0aa2e4e76d43ce0a2c8b2e2a9d7c032abc4e78f tags:
+``` 
+from umap import UMAP
+from sklearn.decomposition import PCA
+pca = PCA(n_components=2)
+X_pca = pca.fit_transform(X)
+umap = UMAP(n_neighbors=20, metric='manhattan', min_dist=0.1)
+X_umap = umap.fit_transform(X)
+scatter1 = px.scatter(X_pca, x=0, y=1, color=y_test)
+scatter1.show()
+scatter2 = px.scatter(X_umap, x=0, y=1, color=y_test, hover_data={'class': y_test, 'index': np.arange(len(X_umap))})
+scatter2.show()
+```
+%% Cell type:markdown id:0009-96c5b071fc624449a3bff00f5acf449ff13a3986090c619ea3c40dbebf7 tags:
+Der Grundpreis ist ohne Ausreißer näher an den Motordaten, d.h. der
+Preis verhält sich ähnlich. Mit Ausreißer ist der Preis weit weg. Das
+lässt sich so interpretieren, dass der Preis für sehr teure Autos nicht
+mehr im Verhältnis zum Motor steht.
--- a/08-korrelation-und-dimensionsreduktion/05-reduce-mnist.ipynb
+++ b/08-korrelation-und-dimensionsreduktion/05-reduce-mnist.ipynb
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# MNIST visualisieren\n",
+    "\n",
+    "In dieser Aufgabe wollen wir einen hochdimensionalen Datensatz in 2D\n",
+    "(oder 3D) plotten. Die Daten werden schon geladen."
+   ],
+   "id": "0001-9c88d904212173177e3cd805c405ca6bffb6c39ce47a88ff66351aa9536"
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "style": "python"
+   },
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "import plotly.express as px\n",
+    "from tensorflow.keras.datasets.mnist import load_data\n",
+    "\n",
+    "(x_train, y_train), (x_test, y_test) = load_data()\n",
+    "\n",
+    "# reshape test set to 10 000 x 784\n",
+    "X = x_test.reshape(-1, 28 * 28)"
+   ],
+   "id": "0002-f9e5f87e267af56fbd5014863286132d368c2876d241287b849ceff60cc"
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Wenn Sie mögen ist hier ein Plot von verschiedenen Bildern der gleichen\n",
+    "Klasse. So können Sie ein Blick reinwerfen."
+   ],
+   "id": "0003-fef1281e3edb72c906210479c5811c47fe2289914626b2c40534881280c"
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "style": "python"
+   },
+   "outputs": [],
+   "source": [
+    "# plot 50 examples for each digit\n",
+    "imgs = np.empty((50, 10, 28, 28))\n",
+    "for j in range(10):\n",
+    "    imgs[:, j] = x_test[y_test == j][:50]\n",
+    "\n",
+    "fig = px.imshow(imgs, animation_frame=0, facet_col=1, facet_col_wrap=5, binary_string=True)\n",
+    "fig.update_xaxes(showticklabels=False)\n",
+    "fig.update_yaxes(showticklabels=False)\n",
+    "fig.show()"
+   ],
+   "id": "0004-c879d58b500c83a0364d8680cc6b05c9cc97d36d3ea3743e112c56644db"
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Transformieren Sie die Daten in 2D (ode 3D) und plotten die\n",
+    "Transformierten Daten als Scatter-Plot mit `y_test` als\n",
+    "Farbunterscheidung."
+   ],
+   "id": "0005-9f3a45468236a64a112ad49260df46a53eb308c1c3e503a34d2ae0c353f"
+  }
+ ],
+ "nbformat": 4,
+ "nbformat_minor": 5,
+ "metadata": {}
+}
+%% Cell type:markdown id:0001-9c88d904212173177e3cd805c405ca6bffb6c39ce47a88ff66351aa9536 tags:
+# MNIST visualisieren
+In dieser Aufgabe wollen wir einen hochdimensionalen Datensatz in 2D
+(oder 3D) plotten. Die Daten werden schon geladen.
+%% Cell type:code id:0002-f9e5f87e267af56fbd5014863286132d368c2876d241287b849ceff60cc tags:
+``` 
+import numpy as np
+import plotly.express as px
+from tensorflow.keras.datasets.mnist import load_data
+(x_train, y_train), (x_test, y_test) = load_data()
+# reshape test set to 10 000 x 784
+X = x_test.reshape(-1, 28 * 28)
+```
+%% Cell type:markdown id:0003-fef1281e3edb72c906210479c5811c47fe2289914626b2c40534881280c tags:
+Wenn Sie mögen ist hier ein Plot von verschiedenen Bildern der gleichen
+Klasse. So können Sie ein Blick reinwerfen.
+%% Cell type:code id:0004-c879d58b500c83a0364d8680cc6b05c9cc97d36d3ea3743e112c56644db tags:
+``` 
+# plot 50 examples for each digit
+imgs = np.empty((50, 10, 28, 28))
+for j in range(10):
+    imgs[:, j] = x_test[y_test == j][:50]
+fig = px.imshow(imgs, animation_frame=0, facet_col=1, facet_col_wrap=5, binary_string=True)
+fig.update_xaxes(showticklabels=False)
+fig.update_yaxes(showticklabels=False)
+fig.show()
+```
+%% Cell type:markdown id:0005-9f3a45468236a64a112ad49260df46a53eb308c1c3e503a34d2ae0c353f tags:
+Transformieren Sie die Daten in 2D (ode 3D) und plotten die
+Transformierten Daten als Scatter-Plot mit `y_test` als
+Farbunterscheidung.
--- a/08-korrelation-und-dimensionsreduktion/autos.csv
+++ b/08-korrelation-und-dimensionsreduktion/autos.csv
--- a/08-korrelation-und-dimensionsreduktion/datasaurus.csv
+++ b/08-korrelation-und-dimensionsreduktion/datasaurus.csv