Software features queue replication architecture.

IBM Apr 05, 2005

Press Release Summary:

WebSphereÃ‚Â® Information Integrator 8.2 features queue replication to provide high transaction volume combined with low latency, which is elapsed time from committing database changes on source database until they are committed on target database. Software offers users real-time access to replicated changes, while synchronizing 2 or more systems, and resolving conflicts that arise when systems concurrently change same data.

Original Press Release:

WebSphere Information Integrator Q replication

With the release of IBMÂ® WebSphereÂ® Information Integrator 8.2, IBM delivered a major advancement in database replication through a function called queue replication, or Q replication. The new replication architecture has emerged from the need for high performance: high transaction volume combined with low latency. In this article, discover the new Q replication architecture, what levels of performance to expect, and what factors influence that performance. Also, learn from examples drawn from laboratory measurements on both AIXÂ® and z/OSÂ® platforms.

Note: Please read the disclaimer before you read the article..

Introduction
With the release of WebSphere Information Integrator 8.2, IBM has delivered a major advancement in database replication through a function called queue replication, also known as Q replication. This new replication architecture has emerged from the need for high performance: high transaction volume combined with low latency, which is the elapsed time from committing database changes on the source database until they are committed on the target database.

This article describes the new Q replication architecture, discusses what levels of performance to expect and what factors influence that performance, and includes examples drawn from laboratory measurements on both AIX and z/OS platforms. This information will be useful to customers interested in using Q Replication, and curious about its effects on performance.

Why a new architecture?
Data replication technology is not new for IBM or DB2Â®, having been first introduced a decade ago with a product named DataPropagator(TM) Relational (DPropR). Since then the names and packages have changed, the product breadth and functionality have expanded, and the resulting set of replication products has experienced substantial commercial success.

The existing replication architecture, known as SQL replication, continues to hold a place alongside Q replication, and will likely continue to be the optimal solution for a variety of customer scenarios.

With SQL replication, database changes are captured and stored temporarily in relational tables called staging tables. Those staging tables are then read from a client interface on the target system and applied to the target tables. SQL Replication is packaged with DB2 for Linux(TM), UNIXÂ®, and WindowsÂ®, available as DB2 Data Propagator for z/OS, and is part of all WebSphere Information Integrator offerings that include replication. For some scenarios, such as those involving a non-DB2 database as source or target, SQL Replication may still be the best choice at this point in time.

Despite its success, SQL Replication has faced some technical challenges, particularly as customer systems have grown and expanded their performance requirements. Q Replication has been designed to address these performance needs, along with increased function and improved manageability.

Systems continue to grow in size and stretch throughput limits. At the same time, customers want more real-time access to the replicated changes. These real-time changes often need to flow in more than one direction, creating the need for synchronizing two or more systems, and the need for resolving conflicts when these systems concurrently change the same data.

Customers have a wide variety of needs for database replication, a process that has often required careful planning and scheduling. Some traditional uses and future trends include:

Maintaining data warehouses. This is one of the most common uses of replication today. Warehouses are isolated from operational data, making them suitable for ad hoc queries without interfering with performance of production applications.

Business continuity and disaster recovery. This may be the fastest growing requirement for database replication. Today the industry provides a variety of hardware and software implementations, along with synchronous and asynchronous replication techniques, to address the problems of business continuity. Q replication operates asynchronously, meaning that changes are propagated after the source database operations are committed. Customers often choose an asynchronous approach when long distances separate primary and backup databases. Distance increases transmission latency, making synchronous approaches impractical for high performance applications. However, asynchronous replication implies that the application can tolerate some transactions lost in transit in the event of a failure. Smaller delay (lower latency) means fewer lost transactions. One unique advantage of Q Replication over other commonly used hardware and software solutions is that the backup database can be continuously active, minimizing the recovery time.

Distribution of data for workload balancing. This usage makes sense as long as the added cost of replication is small relative to the overall amount of processing, as is the case for applications that have low rates of modified data relative to retrieval or application processing.

Geographic distribution or consolidation of data. Performance can be enhanced by keeping appropriate applications "close" to end-users, or co-located with other relevant data.

Enabling applications that use multiple databases. Many Web-based applications are comprised of more than a single database. A customer making an online trade may work with one database when making a trade, but reference a second database to verify completion of the process. The trading database may be designed for a legacy application and organized for online transaction processing (OLTP) performance, whereas the trade history database may be organized for query performance, and reside on a different platform. It is therefore critical to the seamless operation of some applications to be able to move changed data almost immediately from one database to another.