{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Intro\n", "This is going to be a fast whirlwind into python. With 1 hour I won't be able to teach you everything about python, so instead I'll just give you the broad details, some examples, and some links to more references." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# The basics" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Mathematical operators" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "14.6\n" ] } ], "source": [ "a = 4\n", "b = a * 3.4\n", "b += 1 # same as b = b + 1\n", "print(b)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Note**: exponentiation is done by `**` NOT `^`" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "8" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "2**3" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "2^3 # this is actually (2 BITWISE_OR 3), not 2*2*2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Logic and Control Flow" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`if` blocks and other types of logic use whitespace to demark the block of code it affects" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "b is greater than 3\n" ] } ], "source": [ "if b > 3:\n", " print(\"b is greater than 3\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Functions also use whitespace, and need to start with `def`:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "10" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def multiply_by_5(x):\n", " return x*5\n", "\n", "multiply_by_5(2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Importing external code\n", "This approach works for both code you write and packages you install with `conda`/`pip`.\n", "\n", "First I'll just make a sample file." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "with open(\"sample_file.py\", mode=\"w\") as f:\n", " f.write(\"\"\"\n", "def divide_by_two(x):\n", " result = x / 2\n", " return result\n", " \n", "def divide_by_three(x):\n", " result = x / 3\n", " return result\n", "\n", " \"\"\")\n" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2.0" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import sample_file\n", "\n", "sample_file.divide_by_two(4)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "4.0" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from sample_file import divide_by_three # you can import \n", "\n", "divide_by_three(12)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Note**: it's hard to import an arbitrary script stored in a random folder on your computer. The simplest solution is to make sure it's in the same folder as where you started python from, or the folder where your jupyter notebook is located." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To import packages you installed from pip/conda, you do the same thing, but don't need to worry about where it's saved:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "datetime.datetime(2019, 1, 22, 13, 14, 7, 989667)" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import datetime\n", "datetime.datetime.today()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Collections of objects\n", "So I've already shown that python can handle `int`s, `float`s, and `string`s. The next level up is *collections* of these objects.\n", "\n", "In general, python is fine having a collection of multiple types all at once:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "my_list type: \n", "my_other_list type: \n" ] } ], "source": [ "my_list = [1, 2, 3, 4]\n", "my_other_list = [1, \"two\", 3.14]\n", "\n", "print(\"my_list type: \", type(my_list))\n", "print(\"my_other_list type: \", type(my_other_list))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can change the elements within a list, as well as changing the size of a list" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1, 2, 3, 4]\n" ] } ], "source": [ "my_changing_list = [1, 2, 3, 4]\n", "print(my_changing_list)\n" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1, 2, 3, 'hello world']\n" ] } ], "source": [ "my_changing_list[3] = \"hello world\" # note that it's 0-indexed!\n", "print(my_changing_list)\n", "\n", "\n" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1, 2, 3, 'hello world', 'test']\n" ] } ], "source": [ "my_changing_list.append(\"test\")\n", "print(my_changing_list)\n", "\n" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1, 2, 3, 'hello world', 'test', 'appending', 'a', 'second', 'list', 'example']\n" ] } ], "source": [ "my_changing_list += [\"appending\", \"a\", \"second\", \"list\", \"example\"]\n", "print(my_changing_list)\n", "\n" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1, 2, 3, 'hello world', 'test', 'appending', 'a', 'second', 'list']\n", "last_element: example\n" ] } ], "source": [ "last_element = my_changing_list.pop()\n", "print(my_changing_list)\n", "print(\"last_element: \", last_element)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Note**: You technically can run the above cells in any order, but you'll keep getting different results. This is an example of how jupyter notebooks can make it easier to screw up your code/results. In general your notebook should only run from top to bottom, and you should put anything super important into a separate `.py` file." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Creating `list`s: constructor vs literal\n", "\n", "You can also create a `list` using its \"constructor\", rather than the \"literal\" version above. \n", "\n", "The constructor (`list()`) version is most useful when you're converting something else into a list. The \"literal\" version (`[...]`) is best if your coding it in manually." ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[]\n" ] } ], "source": [ "list_from_constructor = list()\n", "print(list_from_constructor)" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['a']\n" ] } ], "source": [ "list_from_constructor.append(\"a\")\n", "print(list_from_constructor)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## `for` loops and iterating over collections\n", "You can use a `for` loop with either the list index or the list values:" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "sample_list = [10, 20, 30, 40]" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "10\n", "20\n", "30\n", "40\n" ] } ], "source": [ "for i in range(len(sample_list)): # iterates from 0 to 3\n", " print(sample_list[i])" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "10\n", "20\n", "30\n", "40\n" ] } ], "source": [ "for value_from_list in sample_list:\n", " print(value_from_list)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## `tuple`\n", "- `tuple` is like a list, except you can't change it once you create it. It has a literal version `(1, 2, ...)` and a constructor `tuple()`" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "ename": "TypeError", "evalue": "'tuple' object does not support item assignment", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0mmy_tuple\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m(\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m2\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m3\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0mmy_tuple\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m2\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m\"new value\"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mTypeError\u001b[0m: 'tuple' object does not support item assignment" ] } ], "source": [ "my_tuple = (1, 2, 3)\n", "my_tuple[2] = \"new value\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Tuple also has a constructor and literal version" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [], "source": [ "tuple_1 = (1, 2, 3)\n", "tuple_2 = tuple([4, 5, 6]) # creates a list, and then converts that into a tuple" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Dict\n", "Unlike lists which require integer indices (`my_list[0]`), dictionaries use user-defined indices.\n", "\n", "(This makes this a \"key -> value\" map, if that helps you understand)" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "5.2\n" ] } ], "source": [ "my_dict = {\"NGC_1234\": 5.2,\n", " \"NGC_5678\": 3.4,\n", " \"SN_1987a\": \"the value matched to SN_1987a\",\n", " }\n", "\n", "print(my_dict[\"NGC_1234\"])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Also `dict` has a literal version:" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [], "source": [ "redshift_dict = {\"NGC_1234\": 5.2}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "and a constructor version" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "5.2\n" ] } ], "source": [ "redshift_dict = dict()\n", "redshift_dict[\"NGC_1234\"] = 5.2\n", "\n", "print(redshift_dict[\"NGC_1234\"])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Note**: `dict`s can use a `tuple` as an index/label, but not `list`, so make sure to convert them. Example (bad, then good):" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "ename": "TypeError", "evalue": "unhashable type: 'list'", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0mtemporary_list\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0;34m\"3\"\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m\"quick\"\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m\"things\"\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0mmy_dict\u001b[0m\u001b[0;34m[\u001b[0m \u001b[0mtemporary_list\u001b[0m \u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;36m1.0\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mTypeError\u001b[0m: unhashable type: 'list'" ] } ], "source": [ "temporary_list = [\"3\", \"quick\", \"things\"]\n", "my_dict[ temporary_list ] = 1.0" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1.0\n" ] } ], "source": [ "my_dict[ tuple(temporary_list) ] = 1.0\n", "\n", "print(my_dict[tuple(temporary_list)])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Using tuples as the key is useful for when you want the key to be a combination of multiple variables (e.g. RA and DEC pairs, not just RA or DEC individually.)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`dict`s can have any data type as their \"value\", just like `list`s and `tuples`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Misc. collections\n", "I won't talk about things like `set`s, `Counters`, etc. Just know there's an entire `collections` library.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# `numpy` and `array`s\n", "`lists` and the built-in functions are great, but we often want to do *numerical* science. For this, basically everything will built upon the package `numpy` (numerical python)" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [], "source": [ "import numpy as np # standard abbreviation" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "7.38905609893065" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.exp(2) # e^2" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3.141592653589793" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.pi" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`numpy` also provides `array`s, which let you create list-like objects, which behave like _vectors_:" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([1, 2, 3])" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "example_list = [1, 2, 3]\n", "example_array = np.array(example_list)\n", "example_array" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([10, 20, 30])" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "10 * example_array" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0. , 0.30103 , 0.47712125])" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.log10(example_array)" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "32" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.dot(np.array([1,2,3]), np.array([4,5,6]))" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([ 1, -2, -5])" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "my_matrix = np.array([[1,2,3],\n", " [4,5,6],\n", " [7,8,9]])\n", "\n", "np.dot(my_matrix, np.array([-2, 0, 1]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In general, `array`s will be the standard way we group together large numbers of values (even if we don't need it to behave like a matrix). Even more complicated data types (like `fits` images) will still be based on `numpy.array`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Plotting with `matplotlib`\n", "\n", "The core plotting package for python is `matplotlib`, and more specifically `matplotlib.pyplot`\n", "\n", "Here's their [example gallery](https://matplotlib.org/gallery/index.html) which is a great place for finding how to do more complex types of plots." ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [], "source": [ "from matplotlib import pyplot as plt" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "xs = np.random.random(100) # draw 100 random values in (0, 1), store as an array\n", "ys = np.random.random(xs.size)\n", "\n", "plt.scatter(xs, ys)\n", "plt.xlabel(\"Add an optional axis label\")\n", "\n", "plt.ylim(bottom=-.2) # you can change the limits of the axes\n", "\n", "plt.xscale(\"log\") # and switch between linear/log spacing" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAW4AAAEACAYAAACTXJylAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvOIA7rQAADoBJREFUeJzt3X9s3PV9x/HX247BpMvWKLmiqc7tgogCzW84oUyJECMbS+sqRfuBQHalBrJjf6zK0KTJIDHEH/P8B5pYpCmTt2YULUArBtJGSgcRi0KkEGpDiqBORYoc4rEuxGOsYWQk5L0/4lieufN9zvl+7/w2z4cUJXf3ue+9z46e+urr731t7i4AQBxtrR4AANAYwg0AwRBuAAiGcANAMIQbAIIh3AAQDOEGgGAINwAEQ7gBIBjCDQDBLMhjo0uXLvVSqZTHpgFgXhoeHj7t7oWUtbmEu1QqaWhoKI9NA8C8ZGYnUtdyqAQAgiHcABAM4QaAYHI5xl3NuXPnNDY2prNnzzbrJUPr7OxUV1eXOjo6Wj0KgDmmbrjNbKWk70256xpJf+bujzbyQmNjY1q0aJFKpZLMrMExP1/cXePj4xobG9Py5ctbPQ6AOaZuuN39p5LWS5KZtUv6N0nPNvpCZ8+eJdqJzExLlizR+++/3+pRAMxBjR7j3iLpZ+6efNrKVEQ7HV8rALU0Gu47JT2ZxyCRlUolnT59esY1/f39TZoGwHyX/MNJM7tC0jZJ99d4vCKpIknFYrHu9kp9+1JfOsnoQHem28taf3+/HnjggVaPAaABtTrV6t40ssf9VUmvuft/VHvQ3Qfdvezu5UIh6VObTTU6OqrrrrtOO3bs0OrVq9XT06P9+/dr06ZNWrFihV599VVduHBBK1asmDy2fOHCBV177bWf2ZseHx/Xbbfdpg0bNujee++Vu08+dvvtt+vGG2/UqlWrNDg4KEnq6+vTxx9/rPXr16unp6fmOgBI0Ui471LwwyTHjx/Xzp079cYbb+jYsWN64okndOjQIT3yyCPq7+9XW1ubent7tXfvXknS/v37tW7dOi1duvT/befhhx/W5s2b9frrr2vbtm169913Jx/bs2ePhoeHNTQ0pF27dml8fFwDAwO66qqrdPTo0cltV1sHACmSwm1mCyX9lqRn8h0nX8uXL9eaNWvU1tamVatWacuWLTIzrVmzRqOjo5Kku+++W48//riki3Hdvn37Z7Zz8OBB9fb2SpK6u7u1ePHiycd27dqldevWaePGjTp58qTefvvtqrOkrgOA6ZKOcbv7/0hakvMsubvyyisn/93W1jZ5u62tTefPn5ckLVu2TFdffbVeeuklHTlyZHIPebpqZ30cOHBA+/fv1+HDh7Vw4ULdcsstVT9wlLoOAKrhI+9V7NixQ729vbrjjjvU3t7+mcdvvvnmyaA///zz+uCDDyRJH374oRYvXqyFCxfq2LFjeuWVVyaf09HRoXPnztVdBwD1EO4qtm3bpjNnzlQ9TCJJDz30kA4ePKgbbrhBL7zwwuRZNFu3btX58+e1du1aPfjgg9q4cePkcyqVitauXauenp4Z1wFAPTb1jIislMtln3497pGREV1//fWZv1YehoaGdN999+nll19u6RyRvmbAfNTM0wHNbNjdyylrm3aRqSgGBga0e/fumse2AaDVOFQyTV9fn06cOKHNmze3ehQAqIpwA0AwTQ13HsfT5yu+VgBqaVq4Ozs7NT4+TpASXLoed2dnZ6tHATAHNe2Hk11dXRobG+Ma04ku/QYcAJiuaeHu6Ojgt7kAQAb44SQABEO4ASAYwg0AwRBuAAiGcANAMIQbAIIh3AAQDOEGgGAINwAEQ7gBIBjCDQDBJIXbzL5oZk+b2TEzGzGzX897MABAdakXmforST90998zsyskLcxxJgDADOqG28x+WdLNkr4lSe7+iaRP8h0LAFBLyh73NZLel/T3ZrZO0rCkne7+0dRFZlaRVJGkYrGY9ZwAAmnGb0dv5m9gn2tSjnEvkHSDpN3uvkHSR5L6pi9y90F3L7t7uVAoZDwmAOCSlHCPSRpz9yMTt5/WxZADAFqgbrjd/eeSTprZyom7tkj6Sa5TAQBqSj2r5NuS9k6cUfKOpO35jQQAmElSuN39qKRyzrMAABLwyUkACIZwA0AwhBsAgiHcABAM4QaAYAg3AARDuAEgGMINAMEQbgAIhnADQDCEGwCCIdwAEAzhBoBgCDcABEO4ASAYwg0AwRBuAAiGcANAMIQbAIIh3AAQTNIvCzazUUm/kPSppPPuzi8OBoAWSQr3hN9w99O5TQIASMKhEgAIJjXcLukFMxs2s0qeAwEAZpZ6qGSTu79nZl+S9KKZHXP3g1MXTAS9IknFYjHjMYHslfr2Vb1/dKC7yZM01+f1fc8nSXvc7v7exN+nJD0r6aYqawbdvezu5UKhkO2UAIBJdcNtZl8ws0WX/i3pNklv5j0YAKC6lEMlV0t61swurX/C3X+Y61QAgJrqhtvd35G0rgmzAAAScDogAARDuAEgGMINAMEQbgAIhnADQDCEGwCCIdwAEAzhBoBgCDcABEO4ASAYwg0AwRBuAAiGcANAMIQbAIIh3AAQDOEGgGAINwAEQ7gBIBjCDQDBEG4ACIZwA0AwyeE2s3Yze93MnstzIADAzBrZ494paSSvQQAAaZLCbWZdkrol/V2+4wAA6knd435U0p9KupDjLACABAvqLTCzr0s65e7DZnbLDOsqkiqSVCwWMxsQ+Dwp9e2r+djoQHcTJ8lHrfeX5XubL68xk5Q97k2StpnZqKSnJN1qZv8wfZG7D7p72d3LhUIh4zEBAJfUDbe73+/uXe5eknSnpJfcvTf3yQAAVXEeNwAEU/cY91TufkDSgVwmAQAkYY8bAIIh3AAQDOEGgGAINwAEQ7gBIBjCDQDBEG4ACIZwA0AwhBsAgiHcABAM4QaAYAg3AARDuAEgGMINAMEQbgAIhnADQDCEGwCCIdwAEAzhBoBgCDcABFM33GbWaWavmtmPzewtM3u4GYMBAKpL+S3v/yvpVnc/Y2Ydkg6Z2fPu/krOswEAqqgbbnd3SWcmbnZM/PE8hwIA1JZ0jNvM2s3sqKRTkl509yP5jgUAqCXlUInc/VNJ683si5KeNbPV7v7m1DVmVpFUkaRisZj5oADyVerbV/Ox0YHuefva9V5/LmrorBJ3/y9JByRtrfLYoLuX3b1cKBQyGg8AMF3KWSWFiT1tmdlVkn5T0rG8BwMAVJdyqORXJX3XzNp1MfTfd/fn8h0LAFBLylklb0ja0IRZAAAJ+OQkAARDuAEgGMINAMEQbgAIhnADQDCEGwCCIdwAEAzhBoBgCDcABEO4ASAYwg0AwRBuAAiGcANAMIQbAIIh3AAQDOEGgGAINwAEQ7gBIBjCDQDBEG4ACIZwA0AwdcNtZsvM7F/NbMTM3jKznc0YDABQ3YKENecl/Ym7v2ZmiyQNm9mL7v6TnGcDAFRRd4/b3f/d3V+b+PcvJI1I+nLegwEAqmvoGLeZlSRtkHQkj2EAAPWlHCqRJJnZL0n6R0l/7O7/XeXxiqSKJBWLxcwGRONKffuq3j860J3J+mhqvb8st9WMr1Wj7yPLmebD/5Es/x+0WtIet5l16GK097r7M9XWuPugu5fdvVwoFLKcEQAwRcpZJSbpO5JG3P0v8x8JADCTlD3uTZK+KelWMzs68edrOc8FAKih7jFudz8kyZowCwAgAZ+cBIBgCDcABEO4ASAYwg0AwRBuAAiGcANAMIQbAIIh3AAQDOEGgGAINwAEQ7gBIBjCDQDBEG4ACIZwA0AwhBsAgiHcABAM4QaAYAg3AARDuAEgGMINAMEQbgAIpm64zWyPmZ0yszebMRAAYGYpe9yPSdqa8xwAgER1w+3uByX9ZxNmAQAkMHevv8isJOk5d189w5qKpIokFYvFG0+cOJHRiBeV+vZVvX90oLuh9bN5Tq31rTTT+4tkNt+/+eDz+r7nu8tphZkNu3s5ZW1mP5x090F3L7t7uVAoZLVZAMA0nFUCAMEQbgAIJuV0wCclHZa00szGzOye/McCANSyoN4Cd7+rGYMAANJwqAQAgiHcABAM4QaAYAg3AARDuAEgGMINAMEQbgAIhnADQDCEGwCCIdwAEAzhBoBgCDcABEO4ASAYwg0AwRBuAAiGcANAMIQbAIIh3AAQDOEGgGAINwAEkxRuM9tqZj81s+Nm1pf3UACA2uqG28zaJf21pK9K+oqku8zsK3kPBgCoLmWP+yZJx939HXf/RNJTkr6R71gAgFpSwv1lSSen3B6buA8A0ALm7jMvMPt9Sb/t7jsmbn9T0k3u/u1p6yqSKhM3V0r6uaQPZzHTUkmnZ/E8NO5XNLvv0Vw3V99Xq+bK+3Wz3n5W27uc7cz2uZfTr19z90LKwgUJa8YkLZtyu0vSe9MXufugpMFLt81s0N0r09fVY2ZD7l5u9Hlo3Gy/R3PdXH1frZor79fNevtZbe9ytjPX+5VyqORHklaY2XIzu0LSnZL+KeF5/3xZk6EZ5uv3aK6+r1bNlffrZr39rLZ3OduZq/+HJCUcKpEkM/uapEcltUva4+5/nttA7HEDCKpZ/Uo5VCJ3/4GkH+Q8yyWD9ZcAwJzUlH4l7XEDAOYOPvIOAMEQbgAIhnADQDChwm1m15jZd8zs6VbPAgD1mNkXzOy7Zva3ZtaT1XabFm4z22Nmp8zszWn3J195cOJ6KffkOykA1NZgy35H0tPu/geStmU1QzP3uB+TtHXqHbWuPGhma8zsuWl/vtTEWQGglseU2DJd/KT5pWs9fZrVAEnncWfB3Q+aWWna3ZNXHpQkM3tK0jfc/S8kfb1ZswFAqkZapouXDOmSdFQZ7ii3+hh3Q1ceNLMlZvY3kjaY2f15DwcAiWq17BlJv2tmu5Xhx+ibtsddg1W5r+Yngtx9XNIf5jcOAMxK1Za5+0eStmf9Yq3e40668iAAzHFNbVmrwz3bKw8CwFzS1JY183TAJyUdlrTSzMbM7B53Py/pjyT9i6QRSd9397eaNRMANGoutIyLTAFAMK0+VAIAaBDhBoBgCDcABEO4ASAYwg0AwRBuAAiGcANAMIQbAIIh3AAQzP8Bh3DwuUGgchMAAAAASUVORK5CYII=\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# same data as above, just taking the histogram of the x axis\n", "plt.hist(xs, label=\"my data\", bins = np.logspace(-1, 0))\n", "\n", "plt.xscale(\"log\")\n", "\n", "plt.legend() # this'll automatically create a legend using the `label`s used when creating the plot" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "multiple plots can be combined:\n" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# hist2d creates a \"heat map\"\n", "plt.hist2d(np.random.normal(size=1000), # x variable\n", " np.random.normal(size=1000), # y variable\n", " cmap = \"Greys\", # sets the colormap\n", " )\n", "\n", "plt.colorbar(label=\"number of points per bin\") # so you can interpret the colors of your colormap\n", "\n", "x_line = np.linspace(0, 1, num=50) # 50 equally spaced points between 0 and 1\n", "y_line = 3*np.sin(x_line*4) - 2\n", "plt.plot(x_line, y_line,\n", " color=\"red\",\n", " label=\"example line plot\",\n", " )\n", "\n", "# this only works if you already have latex installed\n", "plt.ylabel(r\"can use latex math: $\\mathcal{O}(n^{3/2}) \\binom{n}{2}$\")\n", "\n", "plt.legend(loc=\"upper right\") # `loc` sets the location of the legend to the given area\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# `astropy`\n", "`astropy` is the biggest astronomy package in python. It tends to be pretty general purpose, since it's aimed at all astronomers. If you need something more specific (like a package to reduce data from a specific telescope), that will usually be in a separate package. (Here are some [affiliated packages](http://www.astropy.org/affiliated/index.html) which is a good place to start looking if `astropy` doesn't have what you need.)\n", "\n", "It has more than I can talk about here, but I'll go over some popular parts, and you can look at the [tutorials](http://www.astropy.org/astropy-tutorials/) as you need." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## `astropy.units`" ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [], "source": [ "from astropy import units as u\n", "from astropy import constants as const" ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [ { "data": { "text/latex": [ "$1.9884754 \\times 10^{30} \\; \\mathrm{kg}$" ], "text/plain": [ "< name='Solar mass' value=1.9884754153381438e+30 uncertainty=9.236140093538353e+25 unit='kg' reference='IAU 2015 Resolution B 3 + CODATA 2014'>" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Physical constants:\n", "const.M_sun" ] }, { "cell_type": "code", "execution_count": 46, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/latex": [ "$1.9884754 \\times 10^{33} \\; \\mathrm{g}$" ], "text/plain": [ "" ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Unit conversion\n", "const.M_sun.to(u.g)" ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1.9884754153381438e+33" ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# you can still get a plain float if you need\n", "const.M_sun.to(u.g).value" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## `astropy` and `fits` files\n", "Astropy is a great way to read and manipulate `.fits` files (see `astropy.io.fits`). They even have a whole tutorial on this for images, so we're just going to pop over there for a moment:\n", "\n", "http://www.astropy.org/astropy-tutorials/rst-tutorials/FITS-images.html" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# `scipy` for more complex numerical tools than `numpy`" ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [], "source": [ "import scipy" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Note**: `scipy` doesn't import all of itself automatically; it would take too long / be too large. Instead you'll have to explicitly import submodules, like we do below. There's two main options for the syntax:\n", "\n", "1) `import scipy.optimize`, which you then use by calling, e.g., `scipy.optimize.root()`\n", "\n", "2) `from scipy import optimize`, which you can call by simply using `optimize.root()` (you don't need to say `scipy.` first)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Example: function \"root\" solver" ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [], "source": [ "import scipy.optimize" ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [], "source": [ "def f(x):\n", " return x + np.exp(x) - 4" ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 51, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "xs = np.linspace(0, 2)\n", "plt.plot(xs, f(xs))\n", "plt.axhline(0, linestyle=\"dashed\", color=\"black\")" ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [ { "data": { "text/plain": [ " fjac: array([[-1.]])\n", " fun: array([0.])\n", " message: 'The solution converged.'\n", " nfev: 10\n", " qtf: array([-2.66453526e-13])\n", " r: array([-3.92627094])\n", " status: 1\n", " success: True\n", " x: array([1.07372894])" ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" } ], "source": [ "scipy.optimize.root(f, 0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Example: integrator" ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [], "source": [ "import scipy.integrate" ] }, { "cell_type": "code", "execution_count": 54, "metadata": {}, "outputs": [], "source": [ "def deriv(t, y):\n", " # this is dy/dt evaluated at t where y=y(t)\n", " return y" ] }, { "cell_type": "code", "execution_count": 55, "metadata": {}, "outputs": [], "source": [ "ts = np.linspace(0, 5, num=100)\n", "results = scipy.integrate.solve_ivp(deriv,\n", " [min(ts), max(ts)],\n", " [1], #start at t0=min(ts), and y(t0)=1\n", " t_eval = ts,\n", " )" ] }, { "cell_type": "code", "execution_count": 56, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 56, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plt.plot(results.t, \n", " results.y.flatten(),\n", " label=\"approx\",\n", " )\n", "\n", "plt.plot(results.t,\n", " np.exp(results.t),\n", " label=\"exact\",\n", " linestyle=\"dashed\",\n", " )\n", "\n", "\n", "plt.legend(loc=\"best\")\n", "# plt.xlim(-5, 5)\n", "# plt.ylim(-10, 10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Scipy also contains a bunch of useful things like:\n", "- special functions (`airy`, Bessel)\n", "- Fourier transforms\n", "- statistics" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# `pandas` and `DataFrames`\n", "\n", "`pandas` is great for reading in, and exploring a number of different types of datasets. It's still based on `numpy.array`s, but adds a lot more usability\n", "\n", "They also have a nice set of getting started guides:\n", " - [10 Minutes to Pandas](https://pandas.pydata.org/pandas-docs/stable/10min.html)\n", " - [Comparison with R / R libraries](https://pandas.pydata.org/pandas-docs/stable/comparison_with_r.html)\n", " - [Comparison to SQL](https://pandas.pydata.org/pandas-docs/stable/comparison_with_sql.html)" ] }, { "cell_type": "code", "execution_count": 57, "metadata": {}, "outputs": [], "source": [ "import pandas as pd" ] }, { "cell_type": "code", "execution_count": 58, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
methodnumberorbital_periodmassdistanceyear
0Radial Velocity1269.3007.1077.402006
1Radial Velocity1874.7742.2156.952008
2Radial Velocity1763.0002.6019.842011
3Radial Velocity1326.03019.40110.622007
4Radial Velocity1516.22010.50119.472009
5Radial Velocity1185.8404.8076.392008
6Radial Velocity11773.4004.6418.152002
7Radial Velocity1798.500NaN21.411996
8Radial Velocity1993.30010.3073.102008
9Radial Velocity2452.8001.9974.792010
\n", "
" ], "text/plain": [ " method number orbital_period mass distance year\n", "0 Radial Velocity 1 269.300 7.10 77.40 2006\n", "1 Radial Velocity 1 874.774 2.21 56.95 2008\n", "2 Radial Velocity 1 763.000 2.60 19.84 2011\n", "3 Radial Velocity 1 326.030 19.40 110.62 2007\n", "4 Radial Velocity 1 516.220 10.50 119.47 2009\n", "5 Radial Velocity 1 185.840 4.80 76.39 2008\n", "6 Radial Velocity 1 1773.400 4.64 18.15 2002\n", "7 Radial Velocity 1 798.500 NaN 21.41 1996\n", "8 Radial Velocity 1 993.300 10.30 73.10 2008\n", "9 Radial Velocity 2 452.800 1.99 74.79 2010" ] }, "execution_count": 58, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_planets = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/planets.csv')\n", "print(type(df_planets))\n", "\n", "df_planets.head(10) # shows the top 10 rows of the dataframe" ] }, { "cell_type": "code", "execution_count": 59, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
numberorbital_periodmassdistanceyear
count1035.000000992.000000513.000000808.0000001035.000000
mean1.7855072002.9175962.638161264.0692822009.070531
std1.24097626014.7283043.818617733.1164933.972567
min1.0000000.0907060.0036001.3500001989.000000
25%1.0000005.4425400.22900032.5600002007.000000
50%1.00000039.9795001.26000055.2500002010.000000
75%2.000000526.0050003.040000178.5000002012.000000
max7.000000730000.00000025.0000008500.0000002014.000000
\n", "
" ], "text/plain": [ " number orbital_period mass distance year\n", "count 1035.000000 992.000000 513.000000 808.000000 1035.000000\n", "mean 1.785507 2002.917596 2.638161 264.069282 2009.070531\n", "std 1.240976 26014.728304 3.818617 733.116493 3.972567\n", "min 1.000000 0.090706 0.003600 1.350000 1989.000000\n", "25% 1.000000 5.442540 0.229000 32.560000 2007.000000\n", "50% 1.000000 39.979500 1.260000 55.250000 2010.000000\n", "75% 2.000000 526.005000 3.040000 178.500000 2012.000000\n", "max 7.000000 730000.000000 25.000000 8500.000000 2014.000000" ] }, "execution_count": 59, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_planets.describe()" ] }, { "cell_type": "code", "execution_count": 60, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
numberorbital_periodmassdistanceyear
method
Astrometry1.000000631.180000NaN17.8750002011.500000
Eclipse Timing Variations1.6666674751.6444445.125000315.3600002010.000000
Imaging1.315789118247.737500NaN67.7159372009.131579
Microlensing1.1739133153.571429NaN4144.0000002009.782609
Orbital Brightness Modulation1.6666670.709307NaN1180.0000002011.666667
Pulsar Timing2.2000007343.021201NaN1200.0000001998.400000
Pulsation Timing Variations1.0000001170.000000NaNNaN2007.000000
Radial Velocity1.721519823.3546802.63069951.6002082007.518987
Transit1.95466021.1020731.470000599.2980802011.236776
Transit Timing Variations2.25000079.783500NaN1104.3333332012.500000
\n", "
" ], "text/plain": [ " number orbital_period mass \\\n", "method \n", "Astrometry 1.000000 631.180000 NaN \n", "Eclipse Timing Variations 1.666667 4751.644444 5.125000 \n", "Imaging 1.315789 118247.737500 NaN \n", "Microlensing 1.173913 3153.571429 NaN \n", "Orbital Brightness Modulation 1.666667 0.709307 NaN \n", "Pulsar Timing 2.200000 7343.021201 NaN \n", "Pulsation Timing Variations 1.000000 1170.000000 NaN \n", "Radial Velocity 1.721519 823.354680 2.630699 \n", "Transit 1.954660 21.102073 1.470000 \n", "Transit Timing Variations 2.250000 79.783500 NaN \n", "\n", " distance year \n", "method \n", "Astrometry 17.875000 2011.500000 \n", "Eclipse Timing Variations 315.360000 2010.000000 \n", "Imaging 67.715937 2009.131579 \n", "Microlensing 4144.000000 2009.782609 \n", "Orbital Brightness Modulation 1180.000000 2011.666667 \n", "Pulsar Timing 1200.000000 1998.400000 \n", "Pulsation Timing Variations NaN 2007.000000 \n", "Radial Velocity 51.600208 2007.518987 \n", "Transit 599.298080 2011.236776 \n", "Transit Timing Variations 1104.333333 2012.500000 " ] }, "execution_count": 60, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_mean_values = df_planets.groupby(\"method\") \\\n", " .mean()\n", "df_mean_values" ] }, { "cell_type": "code", "execution_count": 61, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 61, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# pandas also provides some simple wrappers to speed up some plotting:\n", "df_planets.plot(x=\"orbital_period\", y=\"mass\", \n", " kind=\"scatter\", \n", " loglog=True)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**More functionality**\n", "\n", "I won't show all the examples here, but other easy things I commonly use are:\n", "* `df_1.join(df_2, how=\"outer\", ...)` \n", "* `df_planets.drop_duplicates()`\n", "* `pd.read_sql_table`\n", "* `pd.concat([df_1, df_2])` - concatenate rows\n", "* `df.apply` (applies a function to each row or column)\n", "* `df.pivot(...)` / `df.melt(...\n", ")`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# `scikit-learn` for basic modelling\n", "\n", "Great. So we have a data in an easy-to-use structure (using `pandas`) and we use `matplotlib` to help with plotting during exploratory data analsysi.\n", "\n", "Now we want to start building basic models on the data. A good starting place is `scikit-learn` (or `sklearn`). It includes some really easy interfaces to some pretty powerful models." ] }, { "cell_type": "code", "execution_count": 62, "metadata": {}, "outputs": [], "source": [ "import sklearn\n", "import sklearn.model_selection\n", "\n", "# filter to only allow planets with mass and orbital period\n", "df_planets_filtered = df_planets[[\"mass\", \"orbital_period\"]]\n", "df_planets_filtered = df_planets_filtered.dropna(axis=0) # this is a built-in pandas.DataFrame function\n", "\n", "# split 75/25 training/testing\n", "df_planets_training, df_planets_testing = sklearn.model_selection.train_test_split(df_planets_filtered)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Linear Model: mass = A * orbital_period + c\n", "\n", "All scikit learn models follow a simple structure (below) which makes it easy to swap/test out new models:\n", "```\n", "model = sklearn. ... .ModelName(...)\n", "model.fit(x, y)\n", "y_new = model.predict(x_new)\n", "```" ] }, { "cell_type": "code", "execution_count": 63, "metadata": {}, "outputs": [], "source": [ "import sklearn.linear_model\n", "\n", "model = sklearn.linear_model.LinearRegression(fit_intercept=True, \n", " normalize=False)\n", "\n", "x_train = df_planets_training[\"orbital_period\"].values.reshape(-1, 1)\n", "y_train = df_planets_training[\"mass\"].values.reshape(-1, 1)\n", "\n", "model.fit(x_train, y_train)\n", "\n", "x_test = df_planets_testing[\"orbital_period\"].values.reshape(-1, 1)\n", "y_test = df_planets_testing[\"mass\"].values.reshape(-1, 1)\n", "\n", "y_predict = model.predict(x_test)" ] }, { "cell_type": "code", "execution_count": 64, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 64, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "xs = np.linspace(0, 2*10**4).reshape(-1, 1)\n", "ys = model.predict(xs)\n", "\n", "plt.plot(xs, ys, label=\"best fit\")\n", "plt.scatter(x_train, y_train, label=\"training\")\n", "# plt.scatter(x_test, y_test, label=\"testing\")\n", "\n", "plt.xlabel(\"orbital period\")\n", "plt.ylabel(\"mass\")\n", "\n", "plt.legend()\n" ] }, { "cell_type": "code", "execution_count": 65, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[]" ] }, "execution_count": 65, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plt.scatter(y_test, y_predict)\n", "plt.xlabel(\"Actual\")\n", "plt.ylabel(\"Predicted\")\n", "\n", "plt.plot([0, 20], [0, 20],\n", " linestyle=\"dashed\", color=\"black\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Okay, so yes, it's a terrible model. This dataset doesn't include any great linear models. But you get the point that actually training and using a linear model is super easy with `scikit-learn`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.1" }, "widgets": { "state": {}, "version": "1.1.2" } }, "nbformat": 4, "nbformat_minor": 2 }