ICS 421 - Spring 2010 - Programming Assignment 1

You may work in a team of two students, but each student needs to make a submission. You are encouraged to engage in general discussions with other teams regarding the assignment, but specific details of a solution, including the solution itself, must always be the team's own work. You may submit the same code as the rest of your team.

Part 1: Distributed DDL Processor (60 pts)

Write a program runDDL that executes a given DDL statement on a cluster of computers each running an instance of a DBMS. The input to runDDL consists of two filenames (stored in variables clustercfg and ddlfile) passed in as commandline arguments. The file clustercfg contains access information for each computer on the cluster. The file ddlfile contains the DDL terminated by a semi-colon to be executed. The runDDL program will execute the same DDL on the database instance of each of the computers on the cluster concurrently using threads. One thread should be spawned for each computer in the cluster.

runDDL should report success or failure of executing the DDL for each node in the cluster to standard output.

You may test your program on a single computer by using different databases to simulate the multiple computers.

You may first write a non-threaded program that executes the DDL on each computer in the cluster in a for-loop and then convert the program to a multi-threaded version.

Part 2: Catalog for Distributed Tables (40 pts)

Modify your program from Part 1 so that it stores metadata about the DDL being executed in a catalog database. The access information of the catalog database will be provided in the clustercfg file as well. The metadata should be stored in a table

dtables(tname char(32), nodedriver char(64), nodeurl char(128), nodeuser char(16), nodepasswd char(16), partmtd int, nodeid int, partcol char(32), partparam1 char(32), partparam2 char(32))

where

tname is the table name involved in the DDL operation.
nodeid is the node number associated with this node.
nodedriver is the driver used to connect to the node in the cluster for this entry
nodeurl is the JDBC URL of the node in the cluster for this entry
nodeuser and nodepasswd are the userid and password of the DBMS instance at the node in the cluster for this entry
partmtd is the partition method used to partition the data in the table
partcol is the column(s) used by the partition method to partition the data in the table
partparam1 and partparam2 are parameters associated with the particular partition method

If this table does not exist in the catalog database, your program will create the table. The field tname should be obtained using a simple parsing of the DDL for the keyword TABLE that precedes the table name (we will switch to a more sophisticated SQL parser later). The fields partmtd, partcol, partparam1, partparam2 should be left as null for this assignment. This table should only be updated on successful execution of the DDLs. For create table DDL, this table should be populated and for drop table DDLs, the relevant entries in this table should be deleted. This operation need not be multi-threaded.

General Requirements

You may program in any of the following languages: Java, C/C++, PHP, Python, Perl.
You may use any third party libraries, but you need to provide them, so that your program is executable on the submission machine.
Your code should minimally support the drop table and create table DDLs.
Your code should minimally work with DB2 Express C DBMS instances.
You should ensure that your code is well commented and readable.
You should develop the code on your own computer. In order for you to receive credit for the assignment, your code must to be compilable and executable on a linux machine. (A linux box will be provided for you to test your code for compliance.)
You must provide two shell scripts compile.sh and run.sh to compile and run your code respectively. The script run.sh should take clustercfg and ddlfile as its 1st and 2nd commandline arguments

Submission Procedure

Since Part 1 and Part 2 are cumulative, you should submit only one program, i.e., the final code for Part 2.

See Submission Instructions

Sample contents of a `clustercfg` file

catalog.driver=com.ibm.db2.jcc.DB2Driver
catalog.hostname=jdbc:db2://10.0.0.3:50001/mycatdb
catalog.username=db2inst1
catalog.passwd=mypasswd

numnodes=2

node1.driver=com.ibm.db2.jcc.DB2Driver
node1.hostname=jdbc:db2://10.0.0.3:50001/mydb1
node1.username=db2inst1
node1.passwd=mypasswd

node2.driver=com.ibm.db2.jcc.DB2Driver
node2.hostname=jdbc:db2://10.0.0.3:50001/mydb2
node2.username=db2inst1
node2.passwd=mypasswd

Sample contents of a `ddlfile` file

CREATE TABLE BOOKS(isbn char(14), title char(80), price decimal);

Sample Output of `./run.sh ./cluster.cfg ./books.sql`

[jdbc:db2://10.0.0.3:50001/mydb2]: ./books.sql success.
[jdbc:db2://10.0.0.3:50001/mydb1]: ./books.sql failed.
[jdbc:db2://10.0.0.3:50001/mycatdb]: catalog updated.