This first assignment guides you through the implementation of the DDL processing component of a parallel SQL processing system. The parallel SQL processing system consists of a cluster of DBMS instances running on different machines (possibly virtual machines). DDLs submitted to the system will need to be translated into corresponding DDLs for each individual DBMS instance in the cluster and executed there. In addition a catalog database stores metadata about what data is stored for each table on each DBMS instance in the cluster.
Design & write a program runDDL
that executes a given DDL statement
on a cluster of computers each running an instance of a
DBMS. The input to runDDL
consists of two filenames (stored
in variables clustercfg
and ddlfile
) passed in as
commandline arguments. The file clustercfg
contains access
information for each computer on the cluster. The file
ddlfile
contains the DDL terminated by a semi-colon to be
executed. The runDDL program will execute the same DDL on
the database instance of each of the computers on the
cluster concurrently using threads. One thread should be
spawned for each computer in the cluster.
The runDDL
program should report success or failure of executing the DDL
for each node in the cluster to standard output.
You may test your program on a single computer by using different databases to simulate the multiple computers.
You may first write a sequential, non-threaded program that executes the DDL on each computer in the cluster in a for-loop and then convert the program to a parallel, multi-threaded version.
Modify your program from Part 1 so that it stores metadata
about the DDL being executed in a catalog database. The
access information of the catalog database will be provided
in the clustercfg
file as well. The metadata should be
stored in a table
dtables(tname char(32),
nodedriver char(64),
nodeurl char(128),
nodeuser char(16),
nodepasswd char(16),
partmtd int,
nodeid int,
partcol char(32),
partparam1 char(32),
partparam2 char(32))
where
tname
is the table name involved in the DDL operation.nodeid
is the node number associated with this node.nodedriver
is the driver used to connect to the node in the
cluster for this entrynodeurl
is the JDBC URL of the node in the cluster for this
entrynodeuser
and nodepasswd
are the userid and password of the
DBMS instance at the node in the cluster for this entrypartmtd
is the partition method used to partition the data
in the tablepartcol
is the column(s) used by the partition method to
partition the data in the tablepartparam1
and partparam2
are parameters associated with the
particular partition methodIf this table does not exist in the catalog database, your
program will create the table. The field tname should be
obtained using a simple parsing of the DDL for the keyword
TABLE that precedes the table name (we will switch to a more
sophisticated SQL parser later). The fields partmtd
,
partcol
, partparam1
, partparam2
should be left as null for
this assignment. This table should only be updated on
successful execution of the DDLs. For create table DDL, this
table should be populated and for drop table DDLs, the
relevant entries in this table should be deleted. This
operation need not be multi-threaded.
compile.sh
and
run.sh
to compile and run your code respectively. The
script run.sh
should take clustercfg
and ddlfile
as
its 1st and 2nd commandline arguments1) A document describing how to use the runDDL
program and
what the program does technically. The document should:
2) Code, config files & scripts. Since Part 1 and Part 2 are cumulative, you should submit only one program, i.e., the final code for Part 2. Your code will be tested using Apache Derby databases (for Java code).
1) Writing: submit a link to the google doc in Laulima->Assignments. Please ensure that you share your google doc with the instructor. Every individual should make a submission.
2) Code: You will submit your code by uploading your code onto the submission server (you can also clone it from github). You need to ensure that your code compiles and runs on the submission server. If you need any additional libraries, please contact the TA to install the libraries. Every group needs to make one submission.
Access to the submission server.
ssh-rsa [public key] [your username]
.
When you have generated the public key, please e-mail it to
the TA.clustercfg
file
catalog.driver=com.ibm.db2.jcc.DB2Driver
catalog.hostname=jdbc:db2://10.0.0.3:50001/mycatdb
catalog.username=db2inst1
catalog.passwd=mypasswd
numnodes=2
node1.driver=com.ibm.db2.jcc.DB2Driver
node1.hostname=jdbc:db2://10.0.0.3:50001/mydb1
node1.username=db2inst1
node1.passwd=mypasswd
node2.driver=com.ibm.db2.jcc.DB2Driver
node2.hostname=jdbc:db2://10.0.0.3:50001/mydb2
node2.username=db2inst1
node2.passwd=mypasswd
ddlfile
file
CREATE TABLE BOOKS(isbn char(14), title char(80), price
decimal);
./run.sh ./cluster.cfg ./books.sql
[jdbc:db2://10.0.0.3:50001/mydb2]: ./books.sql success.
[jdbc:db2://10.0.0.3:50001/mydb1]: ./books.sql failed.
[jdbc:db2://10.0.0.3:50001/mycatdb]: catalog updated.