{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Simple Linear Regression\n", "\n", "*By James Peret – 30/07/2017*\n", "\n", "> Linear regression is a prediction method that is more than 200 years old.\n", "\n", "In statistics, [linear regression](https://en.wikipedia.org/wiki/Linear_regression) is an approach for modeling the relationship between a scalar dependent variable y and one or more explanatory variables (or independent variables) denoted X. The case of one explanatory variable is called simple linear regression.\n", "\n", "[Simple linear regression](https://en.wikipedia.org/wiki/Simple_linear_regression) is a linear regression model with a single explanatory variable. That is, it concerns two-dimensional sample points with one independent variable and one dependent variable (conventionally, the x and y coordinates in a Cartesian coordinate system) and finds a linear function (a non-vertical straight line) that, as accurately as possible, predicts the dependent variable values as a function of the independent variables. The adjective simple refers to the fact that the outcome variable is related to a single predictor.\n", "\n", "### Libraries\n", "\n", "We will be using [pandas](https://pandas.pydata.org) for handling data structures and [matplotlib](https://matplotlib.org/) for plotting graphs." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# imports\n", "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "\n", "# this allows images to appear directly in the notebook\n", "from IPython.display import Image\n", "from IPython.core.display import HTML \n", "\n", "# this allows plots to appear directly in the notebook\n", "%matplotlib inline" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Data\n", "\n", "In this notebook we will be using a dataset of housing prices from the city of São Paulo, Brazil. The data was scraped from the web using this [notebook](../data/housing-prices-sao-paulo.ipynb)." ] }, { "cell_type": "code", "execution_count": 83, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | meters | \n", "price | \n", "
---|---|---|
0 | \n", "100.0 | \n", "1150.0 | \n", "
1 | \n", "102.0 | \n", "760.0 | \n", "
2 | \n", "63.0 | \n", "519.0 | \n", "
3 | \n", "30.0 | \n", "278.2 | \n", "
4 | \n", "55.0 | \n", "400.0 | \n", "