Skip to content

Access PostgreSQL from Python using pscopg2

PostgreSQL is a very popular open source relational databases that has sophisticated features such as Multi-Version Concurrency Control, point in time recovery, tablespaces, etc. In this article we will explore how to connect to a PostgreSQL database from a Python notebook and perform data access operations.

We will use the psycopg2 library, which is a fairly mature Python driver for Postgres. It provides to efficiently perform the full range of SQL operations against Postgres databases. This package is pre-installed in your Workbench. Open the Jupyter notebooks on your Data Scientist Workbench and choose Python in the top right corner to start a new notebook. Create a new cell, enter the following cell and execute it (Ctrl+Enter).
import psycopg2
import sys
Next, enter your database credentials.
#Enter the values for you database connection
dsn_database = "<database name>"           # e.g. "compose"
dsn_hostname = "<your host name>" # e.g.: ""
dsn_port = "<port>"               # e.g. 11101
dsn_uid = "<your user id>"       # e.g. "admin"
dsn_pwd = "<your password>"     # e.g. "xxx"
Now establish the database connection.
    conn_string = "host="+dsn_hostname+" port="+dsn_port+" dbname="+dsn_database+" user="+dsn_uid+" password="+dsn_pwd
    print "Connecting to database\n ->%s" % (conn_string)
    print "Connected!\n"
    print "Unable to connect to the database."
conn.cursor will return a cursor object, you can use this cursor to perform queries. Given the cursor, we can execute a query, for example, to retrieve the list of databases.
cursor = conn.cursor()
cursor.execute("""SELECT datname from pg_database""")
rows = cursor.fetchall()
We can iterate through rows to print the results.
print "\nShow me the databases:\n"
for row in rows:
    print " ", row[0]
We create a test table namely Cars. Use the below code to drop the Cars table if it already exists, then create the table, and insert some data.

cursor.execute("DROP TABLE IF EXISTS Cars")
cursor.execute("CREATE TABLE Cars(Id INTEGER PRIMARY KEY, Name VARCHAR(20), Price INT)")

cursor.execute("INSERT INTO Cars VALUES(1,'Audi',52642)")
cursor.execute("INSERT INTO Cars VALUES(2,'Mercedes',57127)")
cursor.execute("INSERT INTO Cars VALUES(3,'Skoda',9000)")
cursor.execute("INSERT INTO Cars VALUES(4,'Volvo',29000)")
cursor.execute("INSERT INTO Cars VALUES(5,'Bentley',350000)")
cursor.execute("INSERT INTO Cars VALUES(6,'Citroen',21000)")
cursor.execute("INSERT INTO Cars VALUES(7,'Hummer',41400)")
cursor.execute("INSERT INTO Cars VALUES(8,'Volkswagen',21600)")

Now you can run a query to select data from the newly created data and print out the result set using pretty print:
cursor.execute("""SELECT * from Cars""")
rows = cursor.fetchall()

print "\nShow me the databases:\n"
import pprint
You should see the following results:
Show me the records:

((2L, 'Mercedes', 57127L),
 (3L, 'Skoda', 9000L),
 (4L, 'Volvo', 29000L),
 (5L, 'Bentley', 350000L),
 (6L, 'Citroen', 21000L),
 (7L, 'Hummer', 41400L),
 (8L, 'Volkswagen', 21600L))
It is a good practice to close the connection when you are done.
Good luck!

Feedback and Knowledge Base