ICS 421 - Spring 2010 - Programming Assignment 1
You may work in a team of two students, but each student needs to make a
submission. You are encouraged to engage in general discussions with other
teams regarding the assignment, but specific details of a solution, including
the solution itself, must always be the team's own work. You may submit the same
code as the rest of your team.
Part 1: Distributed DDL Processor (60 pts)
Write a program runDDL that executes a given DDL
statement on a cluster of computers each running an instance
of a DBMS. The input to runDDL consists of two filenames
(stored in variables clustercfg and ddlfile) passed in
as commandline arguments. The file clustercfg contains
access information for each computer on the cluster. The
file ddlfile contains the DDL terminated by a semi-colon
to be executed. The runDDL program will execute the same
DDL on the database instance of each of the computers on the
cluster concurrently using threads. One thread should
be spawned for each computer in the cluster.
runDDL should report success or failure of executing the DDL for
each node in the cluster to standard output.
You may test your program on a single computer by using
different databases to simulate the multiple computers.
You may first write a non-threaded program that executes the
DDL on each computer in the cluster in a for-loop and then
convert the program to a multi-threaded version.
Part 2: Catalog for Distributed Tables (40 pts)
Modify your program from Part 1 so that it stores metadata
about the DDL being executed in a catalog database. The
access information of the catalog database will be provided
in the clustercfg file as well. The metadata should
be stored in a table
dtables(tname char(32),
nodedriver char(64),
nodeurl char(128),
nodeuser char(16),
nodepasswd char(16),
partmtd int,
nodeid int,
partcol char(32),
partparam1 char(32), partparam2
char(32))
where
- tname is the table name involved in the DDL operation.
- nodeid is the node number associated with this node.
- nodedriver is the driver used to connect to
the node in the cluster for this entry
- nodeurl is the JDBC URL of the node in the
cluster for this entry
- nodeuser and nodepasswd are the userid
and password of the DBMS instance at the node in the cluster
for this entry
- partmtd is the partition method used to
partition the data in the table
- partcol is the column(s) used by the partition method to
partition the data in the table
- partparam1 and partparam2 are
parameters associated with the particular partition method
If this table does not exist in the catalog
database, your program will create the table.
The field tname should be obtained using a simple
parsing of the DDL for the keyword TABLE that precedes the
table name (we will switch to a more sophisticated SQL
parser later).
The fields partmtd, partcol, partparam1, partparam2 should
be left as null for this assignment. This table should only
be updated
on successful execution of the DDLs. For create table DDL,
this table should be populated and for drop table DDLs, the
relevant entries in this table should be deleted. This
operation need not be multi-threaded.
General Requirements
- You may program in any of the following languages: Java, C/C++, PHP, Python, Perl.
- You may use any third party libraries, but you need to provide them, so that your program is executable on the submission machine.
- Your code should minimally support the drop table and create table DDLs.
- Your code should minimally work with DB2 Express C DBMS instances.
- You should ensure that your code is well commented and readable.
- You should develop the code on your own computer. In order for you to
receive credit for the assignment, your code must to be compilable and
executable on a linux machine. (A linux box will be provided for you to test
your code for compliance.)
- You must provide two shell scripts compile.sh and run.sh
to compile and run your code respectively. The script run.sh should
take clustercfg and ddlfile as its 1st and 2nd commandline
arguments
Submission Procedure
Since Part 1 and Part 2 are cumulative, you should submit only one program, i.e., the final code for Part 2.
See Submission Instructions
Sample contents of a clustercfg file
catalog.driver=com.ibm.db2.jcc.DB2Driver
catalog.hostname=jdbc:db2://10.0.0.3:50001/mycatdb
catalog.username=db2inst1
catalog.passwd=mypasswd
numnodes=2
node1.driver=com.ibm.db2.jcc.DB2Driver
node1.hostname=jdbc:db2://10.0.0.3:50001/mydb1
node1.username=db2inst1
node1.passwd=mypasswd
node2.driver=com.ibm.db2.jcc.DB2Driver
node2.hostname=jdbc:db2://10.0.0.3:50001/mydb2
node2.username=db2inst1
node2.passwd=mypasswd
Sample contents of a ddlfile file
CREATE TABLE BOOKS(isbn char(14), title char(80), price decimal);
Sample Output of `./run.sh ./cluster.cfg ./books.sql`
[jdbc:db2://10.0.0.3:50001/mydb2]: ./books.sql success.
[jdbc:db2://10.0.0.3:50001/mydb1]: ./books.sql failed.
[jdbc:db2://10.0.0.3:50001/mycatdb]: catalog updated.