This first assignment guides you through the implementation of the DDL processing component of a parallel SQL processing system. The parallel SQL processing system consists of a cluster of DBMS instances running on different machines (possibly virtual machines). DDLs submitted to the system will need to be translated into corresponding DDLs for each individual DBMS instance in the cluster and executed there. In addition a catalog database stores metadata about what data is stored for each table on each DBMS instance in the cluster.
Design & write a program runDDL
that executes a given DDL statement
on a cluster of computers each running an instance of a
DBMS. The input to runDDL
consists of two filenames (stored
in variables clustercfg
and ddlfile
) passed in as
commandline arguments. The file clustercfg
contains access
information for each computer on the cluster. The file
ddlfile
contains the DDL terminated by a semi-colon to be
executed. The runDDL program will execute the same DDL on
the database instance of each of the computers on the
cluster concurrently using threads. One thread should be
spawned for each computer in the cluster.
The runDDL
program should report success or failure of
executing the DDL for each node in the cluster to standard
output.
You may test your program on a single computer by using different databases to simulate the multiple computers. You should then test your program using a cluster of docker containers simulating a cluster of computers.
You should first write a sequential, non-threaded program that executes the DDL on each computer in the cluster in a for-loop and optionally convert the program to a parallel, multi-threaded version.
Modify your program from Part 1 so that it stores metadata
about the DDL being executed in a catalog database that may
or may not be on the local machine. The access information
of the catalog database will be provided in the clustercfg
file as well. The metadata should be stored in a table
dtables(tname char(32),
nodedriver char(64),
nodeurl char(128),
nodeuser char(16),
nodepasswd char(16),
partmtd int,
nodeid int,
partcol char(32),
partparam1 char(32),
partparam2 char(32))
where
tname
is the table name involved in the DDL operation.nodeid
is the node number associated with this node.nodedriver
is the driver used to connect to the node in the
cluster for this entry (NOT USED IN S18)nodeurl
is the JDBC URL of the node in the cluster for this
entry (SEE CLUSTERCFG EXAMPLE)nodeuser
and nodepasswd
are the userid and password of the
DBMS instance at the node in the cluster for this entry (NOT USED IN S18)partmtd
is the partition method used to partition the data
in the tablepartcol
is the column(s) used by the partition method to
partition the data in the tablepartparam1
and partparam2
are parameters associated with the
particular partition methodIf this table does not exist in the catalog database, your
program will create the table. The field tname
should be
obtained using a simple parsing of the DDL for the keyword
TABLE that precedes the table name (we will switch to a more
sophisticated SQL parser later). The fields partmtd
,
partcol
, partparam1
, partparam2
should be left as NULL for
this assignment. This table should only be updated on
successful execution of the DDLs. For create table DDL, this
table should be populated and for drop table DDLs, the
relevant entries in this table should be deleted. This
operation need not be multi-threaded.
parDBd.py
for it. The server daemon
parDBd.py
should take two commandline arguments: IP
address or hostname, and port number. See socket
programming exercise1) A document describing how to use the runDDL.py
and
parDBd.py
program and what the program does technically.
The document should:
2) Code, config files & scripts. Since Part 1 and Part 2 are
cumulative, you should submit only one set of code (the
files: the runDDL.py
client program, the parDBd.py
server program and their dependencies), i.e., the
final code for Part 2.
Submit to Laulima->Assignments
clustercfg
fileNote that driver field is not used in S18, but kept here for backwards compatibility.
catalog.driver=com.ibm.db2.jcc.DB2Driver
catalog.hostname=10.0.0.3:50001/mycatdb
numnodes=2
node1.driver=com.ibm.db2.jcc.DB2Driver
node1.hostname=10.0.0.3:50001/mydb1
node2.driver=com.ibm.db2.jcc.DB2Driver
node2.hostname=10.0.0.3:50001/mydb2
ddlfile
file
CREATE TABLE BOOKS(isbn char(14), title char(80), price
decimal);
./run.sh ./cluster.cfg ./books.sql
[10.0.0.3:50001/mydb2]: ./books.sql success.
[10.0.0.3:50001/mydb1]: ./books.sql failed.
[10.0.0.3:50001/mycatdb]: catalog updated.