Initial Population of the Target Database

Once you have tested the Loader and verified its connection to the target, you have the problem of initially loading data into the target.  This article is intended to discuss some ways of doing that and how to make things as efficient as possible.

All Targets

In order to reduce the work in the target machine it is often useful to begin with a backup of your Rdb source database.  The reason for this is that you have control over whatever is happening to that backup.  After the target is initially loaded, you can start your Loader session against the production database.  You will want to ensure that whatever changes have occurred in the production copy are still available either in the active AIJs or the backed-up AIJs.

For efficiency's sake you want to at least four things.

  1. Send existing rows one time and one time only.  Two methods are described below.
  2. Make the work in the target machine as little as possible.  Accordingly the comments here assume that the target tables have no indexes or constraints or triggers on them. Of course, after you have completed populating the target you will add back the constraints and indexes.
  3. Make the Loader's task easiest by directing it to not attempt updates to existing rows in the target. This can be done by telling the Loader in the MapTable file to only do inserts.  You do this by designating the action for each row as "audit" or "insert, noupdate, nodelete" which is equivalent to audit.
  4. Reduce potential conflict in the target between Loader processes.  To do this you should use a Loader family with a single Loader "thread".  (Do not use the keyword "parallel" in the control file.) 

    If you wish to load tables in parallel, you can use multiple LogMiner Loader families with each big table assigned to its own family.  This will also have a side effect of allowing the Loader to avoid protecting itself against buried updates as must be done with multiple Loader threads.  The demand on the OpenVMS Lock Manager will be eliminated.

In order to touch the rows once you can use two methods.  The first is manual and requires you to type SQL.  The other is done very efficiently by a tool in the kit called the DataPump.

If you are directing the show you can just execute the SQL update statement

  set transaction read write reserving <your table name> for exclusive write;
  update <your table name> set <your column name> = <your column name>;
  commit;

When you commit your transaction, the commit record will appear in the AIJ and be detected by the LogMiner and the resulting rows will be sent to the Loader which will do the inserts in the target.

This approach may yield some very large transactions and therefore place a significant overhead on the LogMiner and the Loader. The overhead in the LogMiner comes from the fact that it must order rows to ensure that only the latest instance is sent to the Loader.  The Loader will swallow all rows in a transaction into virtual memory and if the transaction is large data will spill over into the system's paging file. 

The second way is to use the DataPump.  This tool is part of the Loader kit.  The original design for this was oriented toward production systems and was used to "pump" rows that became damaged in the target.  It can, however, be used to populate the target database and customers report that it may be the fastest method.

The DataPump works by reading two files, a structure file and a driver file. There are samples of these files in the Loader kit.

The structure file contains the general specifications of which rows to pump.  It is called a structure file because you can define hierarchies such as parent-child-grandchild row sets.  The structure file also contains information about the sizes of the transactions that you wish to pump.  In this way, the potentially large overhead of the first method above disappear.  Basically the structure file contains the elements of dynamic SQL with place holders for actual data values.

The driver file contains the specific data values to be used.

The DataPump was designed to be used in a production environment.  It works with a very low contention model.  It begins by first generating a list of database keys for the rows to be pumped.  This is done in a read-only transaction so there are no row or index locks held.  It then proceeds to "touch" rows with a no-change update in small transactions.  The rows are accessed by database key and so no index locks are held.  When the commits occur the commit record appears in the AIJ and the LogMiner picks the rows up and sends them to the Loader which sends them to the target.

In order to work as efficiently as possible, the DataPump does all of its work within Rdb in processor executive mode.  This means that it is very difficult to stop the DataPump once it is turned loose.  Make sure you have plenty of journal space for the workload you are doing.

If you are using a version of Rdb prior to 7.0.7.1 you must limit the number of tables in your structure file to fewer than ten because of a bug related to the use of local temporary tables.

Rdb Targets

When the target is an Rdb database you have a number of options.

If the target is being used to allow scaling of reporting out of an existing application the target and source may look identical. In this case the target can best be created using an RMU backup and RMU restore operation.

Other methods are possible. For instance you may use RMU/Unload and RMU/Load to move a table. This approach has the disadvantage that the unload file actually will touch the disk at least once and perhaps more if the target lives in a different machine. That represents overhead.

Another model is to use SQL to attach to both the source and the target. One can then use the construct “insert into <table…> select <….> from <table>; This has the advantage that the data never touches disk except in the target database.

Oracle Targets

The Loader kit contains procedures that will allow you to unload an Rdb database into a format that can be read by SQL*Loader. You can use these to create unload files similar to the RMU/Unload described above. The disadvantage to this is the overhead of placing data on disk potentially multiple times.

If your Rdb source database is configured with SQL*net for Rdb you can create a database link in the target database that points at the source database. Data may then be copied using the “insert into <table…> select <….> from <table>; model noted above.

SQL Server Targets

With SQL Server targets you can install an Oracle Rdb ODBC driver on the target machine. With this in place, you can then create linked tables that allow SQL Server to perceive the Rdb table as if it were local. One may then use using the “insert into <table…> select <….> from <table>; model noted above. This is the way the Loader regression test copies data from the source to the target to perform differences after a run of random updates.

If you have no NULLs in your database you may also consider unloading your Rdb table into a comma separated list where all numeric and date columns are represented as text. These files may then be moved to the Windows machine hosting SQL Server and loaded into the database.