No Description

load-save-data.ipynb 6.5KB

    { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Load and save data\n", "\n", "Exploring multiple ways of loading and saving data with python. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Open a file and load data\n", "\n", "To load data from a file, use [pandas](https://pandas.pydata.org):" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>0</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>0</th>\n", " <td>https://www.adafruit.com/product/1782</td>\n", " </tr>\n", " <tr>\n", " <th>1</th>\n", " <td>https://www.adafruit.com/product/1766</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ " 0\n", "0 https://www.adafruit.com/product/1782\n", "1 https://www.adafruit.com/product/1766" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "data = pd.read_csv('example-data.txt', sep=\" \", header = None)\n", "data.head() # Will show the DataFrame in Jupyter Notebooks" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'https://www.adafruit.com/product/1782'" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Label columns from the dataset\n", "data.columns = [\"url\"]\n", "data.url[0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "More information on working with text data in pandas can be found [here](https://pandas.pydata.org/pandas-docs/stable/text.html)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Save data TXT file\n", "\n", "To save data to a file like TXT or any other type, use the code snippet below. Remember that data in each filetype needs its own structure. For example, a CVS file needs commas to separate the data.\n", "\n", "When saving a *pandas* dataset directly to file like this, each row will be converted to a line and each column will be separated by a space." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "filename = 'example-data.txt'\n", "data.url.to_csv(filename, index=False, encoding='utf-8')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To test if the file was saved:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>0</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>0</th>\n", " <td>https://www.adafruit.com/product/1782</td>\n", " </tr>\n", " <tr>\n", " <th>1</th>\n", " <td>https://www.adafruit.com/product/1766</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ " 0\n", "0 https://www.adafruit.com/product/1782\n", "1 https://www.adafruit.com/product/1766" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "saved_data = pd.read_csv(filename, sep=\" \", header = None)\n", "saved_data.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Save data to a JSON file\n", "\n", "To save a file to json, use the [pandas to_json function](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_json.html)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": true, "scrolled": true }, "outputs": [], "source": [ "import json\n", "out = data.url.to_json(orient='records', lines=True)\n", "with open('example-data.json', 'w') as f:\n", " f.write(out)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now lets open this file and check if the data is valid:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>url</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>0</th>\n", " <td>https://www.adafruit.com/product/1782</td>\n", " </tr>\n", " <tr>\n", " <th>1</th>\n", " <td>https://www.adafruit.com/product/1766</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ " url\n", "0 https://www.adafruit.com/product/1782\n", "1 https://www.adafruit.com/product/1766" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "new_file = pd.read_csv('example-data.json', sep=\" \", header = None)\n", "data.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.0" } }, "nbformat": 4, "nbformat_minor": 2 }