|
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Load and save data\n",
"\n",
"Exploring multiple ways of loading and saving data with python. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Open a file and load data\n",
"\n",
"To load data from a file, use [pandas](https://pandas.pydata.org):"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>0</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>https://www.adafruit.com/product/1782</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>https://www.adafruit.com/product/1766</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" 0\n",
"0 https://www.adafruit.com/product/1782\n",
"1 https://www.adafruit.com/product/1766"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"data = pd.read_csv('example-data.txt', sep=\" \", header = None)\n",
"data.head() # Will show the DataFrame in Jupyter Notebooks"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'https://www.adafruit.com/product/1782'"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Label columns from the dataset\n",
"data.columns = [\"url\"]\n",
"data.url[0]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"More information on working with text data in pandas can be found [here](https://pandas.pydata.org/pandas-docs/stable/text.html)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Save data TXT file\n",
"\n",
"To save data to a file like TXT or any other type, use the code snippet below. Remember that data in each filetype needs its own structure. For example, a CVS file needs commas to separate the data.\n",
"\n",
"When saving a *pandas* dataset directly to file like this, each row will be converted to a line and each column will be separated by a space."
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [],
"source": [
"filename = 'example-data.txt'\n",
"data.url.to_csv(filename, index=False, encoding='utf-8')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To test if the file was saved:"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>0</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>https://www.adafruit.com/product/1782</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>https://www.adafruit.com/product/1766</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" 0\n",
"0 https://www.adafruit.com/product/1782\n",
"1 https://www.adafruit.com/product/1766"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"saved_data = pd.read_csv(filename, sep=\" \", header = None)\n",
"saved_data.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Save data to a JSON file\n",
"\n",
"To save a file to json, use the [pandas to_json function](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_json.html)"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": true,
"scrolled": true
},
"outputs": [],
"source": [
"import json\n",
"out = data.url.to_json(orient='records', lines=True)\n",
"with open('example-data.json', 'w') as f:\n",
" f.write(out)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now lets open this file and check if the data is valid:"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>url</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>https://www.adafruit.com/product/1782</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>https://www.adafruit.com/product/1766</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" url\n",
"0 https://www.adafruit.com/product/1782\n",
"1 https://www.adafruit.com/product/1766"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"new_file = pd.read_csv('example-data.json', sep=\" \", header = None)\n",
"data.head()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.0"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
|