Ipython-Sql

TL;DR

If your work is related to SQL & databases, you may be frustrated with troubleshooting your SQL code or SQL practice since:

  • Terminal does not have a beautiful UI & is hard to manipulate your code
  • IDE such as MySQLWorkbench does not support multiple outputs

Let’s see a demo output of a tool, ipython-sql, I would like to introduce today.

demo output

You may recognize that demo output looks like a screenshot of Jupyter Notebook. You’re right! We all like Jupyter since the notebook supports multiple outputs, which makes your debugging work much easier!
BTW one of the amazing features of Databricks is similar to the above screenshot in that you can run SQL code directly in Databricks notebook to query databases and data warehouse.

Python users can also use SQLAlchemy. However, ipython-sql can save your time in inputting print command or other commands


Installation

This post assumes you have the following programs:

  • Python 3.9 with pip
  • Any SQL server (local or remote)
  • optional: pipenv installed by pip
    Go to your projects dir. Created a dir named SQL, and enable pipenv:
1
2
3
4
5
6
7
8
# Go to your dir projects
$ mkdir SQL
# Go to dir SQL
$ cd SQL
# Optional: If dir SQL is under another dir with Pipfile, run
$ touch Pipfile
# Optional: If you would like to save your virtualenv in dir SQL, run
$ mkdir .venv

Spawns a pipenv shell with Python3.9

1
pipenv shell - python=3.9


Install required packages:

1
(SQL)$ pipenv install jupyter jupyterlab ipython-sql mysqlclient pandas matplotlib

mysqlclient is the connection drive for ipython-sql to connect MySQL databases. If you have different databases, you need to download the required drivers. ipython-sql uses SQLAlchemy connection string.

Add the current pipenv shell to Jupyter kernel.

For more info, see Jupyter Notebook Kernels: How to Add, Change, Remove.

1
(SQL)$ ipython kernel install --name "SQL" --user

- - -

Demo

Demo Database

Run the following SQL code on your terminals to create a database leetcode

1
create database if not exists leetcode;

Run Jupyter

Run Jupyter lab or notebook:

1
jupyter lab

Add a new .ipynb, and you should be able to select kernel SQL:


Load packages

Load sql extension and setup MySQL connection string.
For more info about SQL connection string, see SQLAlchemy - Database URLs.

1
2
3
4
# load extension sql
%load_ext sql
# setup SQL connection string
%sql mysql+mysqldb://<user>:<password>@localhost/<dataBase>

Demo Table

Run the following SQL code on your Jupyter Notebook to create a demo table Activity:

1
2
3
4
5
6
7
8
9
%%sql
drop table if exists Activity;
Create table If Not Exists Activity (player_id int, device_id int, event_date date, games_played int);
Truncate table Activity;
insert into Activity (player_id, device_id, event_date, games_played) values ('1', '2', '2016-03-01', '5');
insert into Activity (player_id, device_id, event_date, games_played) values ('1', '2', '2016-03-02', '6');
insert into Activity (player_id, device_id, event_date, games_played) values ('2', '3', '2017-06-25', '1');
insert into Activity (player_id, device_id, event_date, games_played) values ('3', '1', '2016-03-02', '0');
insert into Activity (player_id, device_id, event_date, games_played) values ('3', '4', '2018-07-03', '5');

Demo Outputs

SELECT in two ways

1
2
3
%%sql
# Opt 1
select * from Activity;

Or

1
2
# Opt 2
%sql select * from Activity

select in two ways


Variable substitution

1
2
3
# Variable substitution
player_id = "1"
%sql select * from Activity where player_id = :player_id


Assign output to a variable

1
2
3
# Assign output to a variable
result = %sql select * from Activity
result

Convert a variable from sql.run.ResultSet to pandas.core.frame.DataFrame

1
2
result = result.DataFrame()
result


Simple visualizations

1
2
result = %sql select player_id, sum(games_played) as games_played from Activity group by player_id
result.pie();

For complicated visualizations, convert the output from sql.run.ResultSet to pandas.core.frame.DataFrame then render your charts by your favorite visualization tools such as plotly.


References

ipython-sql
SQLAlchemy
Connection string