ICS 421 - Spring 2010 - Programming Assignment 3
Updated Mar 22 to fix localnode.driver.
Updated Mar 12 to include policy on grammar sharing.
Updated Mar 12 to include temporary table information.
You may work in a team of two students, but each student needs to make a
submission. You are encouraged to engage in general discussions with other
teams regarding the assignment, but specific details of a solution, including
the solution itself, must always be the team's own work. You may submit the same
code as the rest of your team.
Part 5: Distributed SQL Processor Supporting Joins (100 pts)
Modify your code from Part 3 (runSQL) to support select-from-where
queries involving joins between exactly two tables. The two tables to be joined
may be distributed tables with fragments residing on multiple nodes.
The input to runSQL consists of two filenames
(stored in variables clustercfg and sqlfile) passed in
as commandline arguments. The file clustercfg contains
access information for the catalog DB. The file sqlfile contains the
SQL terminated by a semi-colon to be executed.
The join query needs to be executed on multiple threads; However, you should
develop the sequential non-multi-threaded version first. Guidance will be
provided on how to convert the join QEP into a multi-threaded one.
runSQL should output the rows retrieved to the
standard output on success or report failure.
You may assumed that the SQL queries only operate on
at most two tables and do not contain any nested subqueries.
You should consider using the ANTLR compiler compiler to generate a SQL
parser that you can use to extract the table names and to detect the join
conditions. Only equi-joins need to be supported.
Policy on grammar sharing
You may discuss, share, and/or develop the grammar with other groups.
You may test your program on a single computer by using
different databases to simulate the multiple computers.
Sample contents of a clustercfg file
catalog.driver=com.ibm.db2.jcc.DB2Driver
catalog.hostname=jdbc:db2://10.0.0.3:50001/mycatdb
catalog.username=db2inst1
catalog.passwd=mypasswd
localnode.driver=com.ibm.db2.jcc.DB2Driver
localnode.hostname=jdbc:db2://10.0.0.3:50001/mydb1
localnode.username=db2inst1
localnode.passwd=mypasswd
localnode identifies the node in the cluster that received the query and that
will co-ordinate the execution of the query.
Sample contents of a sqlfile file
SELECT S.sname
FROM Sailors S, Reserves R
WHERE S.sid=R.sid and R.day='2009-12-21'
;
Sample Output of `./run3.sh ./cluster.cfg ./sailors.sql`
Ramakrishnan
Silberstein
[jdbc:db2://10.0.0.3:50001/mydb2]: ./sailors.sql success.
[jdbc:db2://10.0.0.3:50001/mydb1]: ./sailors.sql failed.
It is recommended that you develop your program in the following steps
- Focus on the distributed join algorithm first. Assume no multi-threading,
no projection or additional selection operators in the QEP.
- Once the
distributed join algorithm is implemented and tested, work on incorporating
projection.
- Next incorporate additional filtering predicates (ie selection operators) from the where-clause
- Keep in mind to pushdown predicates/selections and projections as much as possible.
- Convert the join operation to a multi-threaded version by assigning each join between two fragments to a thread.
You may need to use temporary tables for your joins. On DB2 you will need to use the following statement: DECLARE GLOBAL TEMPORARY TABLE.
General Requirements
- You may program in any of the following languages: Java, C/C++, PHP, Python, Perl.
- You are encouraged to use ANTLR to generate the code to parse SQL statements as well as to parse the CSV file. It will save you a lot of time.
- You may use any third party libraries, but you need to provide them, so that your program is executable on the submission machine.
- Your code should minimally support select-from-where statements
- Your code should minimally work with DB2 Express C DBMS instances.
- You should ensure that your code is well commented and readable.
- You should develop the code on your own computer. In order for you to
receive credit for the assignment, your code must to be compilable and
executable on a linux machine. (A linux box will be provided for you to test
your code for compliance.)
- You must provide two shell scripts compile.sh and run.sh
to compile and run your code respectively. The script run.sh should
take clustercfg and sqlfile as its 1st and 2nd commandline
arguments
Submission Procedure
Submit:
- One set of source files for Part 5
- compile5.sh and run5.sh for Part 5
- submission of grammar file is optional but recommended, you may just submit the generated files
antlrworks.jar is installed in /home/db2inst1/lib/antlrworks.jar on the submission machine
Follow the Submission Procedure