Assignment 1: DDL Processing for a Parallel DBMS

Learning Objectives

Background

This first assignment guides you through the implementation of the DDL processing component of a parallel SQL processing system. The parallel SQL processing system consists of a cluster of DBMS instances running on different machines (possibly virtual machines). DDLs submitted to the system will need to be translated into corresponding DDLs for each individual DBMS instance in the cluster and executed there. In addition a catalog database stores metadata about what data is stored for each table on each DBMS instance in the cluster.

Part 1: DDL Processor

Design & write a program runDDL that executes a given DDL statement on a cluster of computers each running an instance of a DBMS. The input to runDDL consists of two filenames (stored in variables clustercfg and ddlfile) passed in as commandline arguments. The file clustercfg contains access information for each computer on the cluster. The file ddlfile contains the DDL terminated by a semi-colon to be executed. The runDDL program will execute the same DDL on the database instance of each of the computers on the cluster concurrently using threads. One thread should be spawned for each computer in the cluster. The runDDL program should report success or failure of executing the DDL for each node in the cluster to standard output.

You may test your program on a single computer by using different databases to simulate the multiple computers. You should then test your program using a cluster of docker containers simulating a cluster of computers.

You should first write a sequential, non-threaded program that executes the DDL on each computer in the cluster in a for-loop and optionally convert the program to a parallel, multi-threaded version.


Part 2: Update Catalog Metadata

Modify your program from Part 1 so that it stores metadata about the DDL being executed in a catalog database that may or may not be on the local machine. The access information of the catalog database will be provided in the clustercfg file as well. The metadata should be stored in a table

dtables(tname char(32), 
   nodedriver char(64), 
   nodeurl char(128), 
   nodeuser char(16), 
   nodepasswd char(16), 
   partmtd int, 
   nodeid int, 
   partcol char(32), 
   partparam1 char(32),
   partparam2 char(32))

where

If this table does not exist in the catalog database, your program will create the table. The field tname should be obtained using a simple parsing of the DDL for the keyword TABLE that precedes the table name (we will switch to a more sophisticated SQL parser later). The fields partmtd, partcol, partparam1, partparam2 should be left as NULL for this assignment. This table should only be updated on successful execution of the DDLs. For create table DDL, this table should be populated and for drop table DDLs, the relevant entries in this table should be deleted. This operation need not be multi-threaded.

General Requirements

Deliverables

1) A document describing how to use the runDDL.py and parDBd.py program and what the program does technically. The document should:

2) Code, config files & scripts. Since Part 1 and Part 2 are cumulative, you should submit only one set of code (the files: the runDDL.py client program, the parDBd.py server program and their dependencies), i.e., the final code for Part 2.

Submission Procedure

Submit to Laulima->Assignments


Sample contents of a clustercfg file

Note that driver field is not used in S18, but kept here for backwards compatibility.


catalog.driver=com.ibm.db2.jcc.DB2Driver
catalog.hostname=10.0.0.3:50001/mycatdb

numnodes=2

node1.driver=com.ibm.db2.jcc.DB2Driver
node1.hostname=10.0.0.3:50001/mydb1

node2.driver=com.ibm.db2.jcc.DB2Driver
node2.hostname=10.0.0.3:50001/mydb2

Sample contents of a ddlfile file


CREATE TABLE BOOKS(isbn char(14), title char(80), price
decimal);

Sample Output of ./run.sh ./cluster.cfg ./books.sql


[10.0.0.3:50001/mydb2]: ./books.sql success.
[10.0.0.3:50001/mydb1]: ./books.sql failed.
[10.0.0.3:50001/mycatdb]: catalog updated.