Assignment 1: DDL Processing for a Parallel DBMS

Learning Objectives

Background

This first assignment guides you through the implementation of the DDL processing component of a parallel SQL processing system. The parallel SQL processing system consists of a cluster of DBMS instances running on different machines (possibly virtual machines). DDLs submitted to the system will need to be translated into corresponding DDLs for each individual DBMS instance in the cluster and executed there. In addition a catalog database stores metadata about what data is stored for each table on each DBMS instance in the cluster.

Part 1: DDL Processor

Design & write a program runDDL that executes a given DDL statement on a cluster of computers each running an instance of a DBMS. The input to runDDL consists of two filenames (stored in variables clustercfg and ddlfile) passed in as commandline arguments. The file clustercfg contains access information for each computer on the cluster. The file ddlfile contains the DDL terminated by a semi-colon to be executed. The runDDL program will execute the same DDL on the database instance of each of the computers on the cluster concurrently using threads. One thread should be spawned for each computer in the cluster. The runDDL program should report success or failure of executing the DDL for each node in the cluster to standard output.

You may test your program on a single computer by using different databases to simulate the multiple computers.

You may first write a sequential, non-threaded program that executes the DDL on each computer in the cluster in a for-loop and then convert the program to a parallel, multi-threaded version.


Part 2: Update Catalog Metadata

Modify your program from Part 1 so that it stores metadata about the DDL being executed in a catalog database. The access information of the catalog database will be provided in the clustercfg file as well. The metadata should be stored in a table

dtables(tname char(32), 
   nodedriver char(64), 
   nodeurl char(128), 
   nodeuser char(16), 
   nodepasswd char(16), 
   partmtd int, 
   nodeid int, 
   partcol char(32), 
   partparam1 char(32),
   partparam2 char(32))

where

If this table does not exist in the catalog database, your program will create the table. The field tname should be obtained using a simple parsing of the DDL for the keyword TABLE that precedes the table name (we will switch to a more sophisticated SQL parser later). The fields partmtd, partcol, partparam1, partparam2 should be left as null for this assignment. This table should only be updated on successful execution of the DDLs. For create table DDL, this table should be populated and for drop table DDLs, the relevant entries in this table should be deleted. This operation need not be multi-threaded.

General Requirements

Deliverables

1) A document describing how to use the runDDL program and what the program does technically. The document should:

2) Code, config files & scripts. Since Part 1 and Part 2 are cumulative, you should submit only one program, i.e., the final code for Part 2. Your code will be tested using Apache Derby databases (for Java code).

Submission Procedure

1) Writing: submit a link to the google doc in Laulima->Assignments. Please ensure that you share your google doc with the instructor. Every individual should make a submission.

2) Code: You will submit your code by uploading your code onto the submission server (you can also clone it from github). You need to ensure that your code compiles and runs on the submission server. If you need any additional libraries, please contact the TA to install the libraries. Every group needs to make one submission.

Access to the submission server.


Sample contents of a clustercfg file


catalog.driver=com.ibm.db2.jcc.DB2Driver
catalog.hostname=jdbc:db2://10.0.0.3:50001/mycatdb
catalog.username=db2inst1
catalog.passwd=mypasswd

numnodes=2

node1.driver=com.ibm.db2.jcc.DB2Driver
node1.hostname=jdbc:db2://10.0.0.3:50001/mydb1
node1.username=db2inst1
node1.passwd=mypasswd

node2.driver=com.ibm.db2.jcc.DB2Driver
node2.hostname=jdbc:db2://10.0.0.3:50001/mydb2
node2.username=db2inst1
node2.passwd=mypasswd

Sample contents of a ddlfile file


CREATE TABLE BOOKS(isbn char(14), title char(80), price
decimal);

Sample Output of ./run.sh ./cluster.cfg ./books.sql


[jdbc:db2://10.0.0.3:50001/mydb2]: ./books.sql success.
[jdbc:db2://10.0.0.3:50001/mydb1]: ./books.sql failed.
[jdbc:db2://10.0.0.3:50001/mycatdb]: catalog updated.