IBM BPM is one of most successful products in the Smarter Process and business process management world. It is used globally across various industries, including financial services and banking, healthcare and insurance, retail, manufacturing, and media and entertainment. IBM BPM offers many exciting features, including process transformation, optimization, modeling, performance analytics, performance data warehousing, etc. However, when it comes to a process data archival strategy, the product doesn’t offer its own solution, as archival requirements differ greatly from one customer to another.
Archiving process data is a very common requirement for almost every customer, especially when it comes to banks and financial institutions. To comply with regulations, some institutions require at least the past 7 years’ worth of data (for completed processes), and some may need even more. Thus, implementing an archival strategy and analytics around process data are always critical business needs for most customers around the world.
This blog post primarily showcases high-level strategy and the design of a robust and efficient archival solution for IBM BPM (with multi-tenant/cell architecture) using NoSQL technologies such as Elasticsearch (or Solr or MongoDB) and DB2.
What and When to Archive
- Archiving BPM data becomes trivial, especially while dealing with the huge volume of transactions and considering factors like the performance and scalability of the system.
- As part of the archival process, the process and the task instance details need to be archived during the completion of a process instance.
- Every long-running process (implemented as a BPD or BPEL process flow) should have/define a step to archive the process instance data and business data (searchable exposed fields). Ideally, this should be the last step.
How and Where to Archive
- As mentioned in the previous section, business data and process-specific instance data should be archived during the last step in a process.
The archival step in the process flow essentially calls an archival service.
- Note: This archival service can technically be implemented as a mediation module in BPM, as shown in the design proposed here in this blog. However, this service can also be implemented as a simple Java service, especially when using BPM Standard/Express editions.
- The notion here is to archive the business data in any NoSQL and Lucene-based indexes—preferably Elasticsearch, Solr, or MongoDB—and the process instance data in the archival database, which could be any RDBMS-based database acting as the system of record (SOR).
- The main reason for using NoSQL and Lucene-based indexing technologies is to provide an efficient and better-performing search that can filter functionalities, especially when it comes to unstructured data.
Service for Archiving the Completed Process Instance Data
The archival service can be implemented as a mediation module, which should ideally implement an interface with the following operations:
The figure shown below explains the archival process, including the persistence and retrieval of the process data in BPM.
PersistArchivalData essentially performs two main activities:
- Persist the business data in the chosen NoSQL indexes (Elasticsearch, Solr, or MongoDB) as JSON along with process instance ID by calling the indexing APIs (REST API) exposed by the respective indexes.
- Persist the process instance data into the archival database (SOR), which can be any RDBMS-based database like DB2, Oracle, MySQL, etc.
The archival database table ideally contains the following columns:
- Instance ID (VARCHAR)
- System ID (VARCHAR)
- Instance data (CLOB)
- Created date (DATETIME)
- Completed date (DATETIME)
How to Read Archived Data
Archived process data is typically needed in two different use cases.
- Displaying the list of completed instances (with business data).
- One of the very common use cases, which is viewing and displaying the list of completed instances, is addressed here. As shown in the diagram above, this list including the business data (exposed fields) can be retrieved from the chosen NoSQL and Lucene-based index by calling the REST API exposed by the respective indexes.
- Note: A custom UI needs to be built to display the list of completed instances, using either a BPM human service with coaches or any external UI.
- Viewing the completed process instance details.
- From the completed instances list, display the selected process instance details by calling a service that internally queries the archival database using the process instance ID, as shown in the diagram.
Purging the Completed Instances
Purging is a significant step in the archival strategy and one of the most important maintenance activities in BPM for keeping the environment stable and healthy. The high-level details on how and when to purge the completed instances are provided below.
When and How to Purge Completed Instances
Purging should ideally happen as a periodic/scheduled activity in the backend, isolated from the real-time process flow. Since the instance-level details (including the business data) get archived as a last step in the process, purging the completed process and the task instances can ideally happen any time after business hours as a maintenance activity. The recommendation here is to set up a job or scheduled activity to run once in a week to purge the completed instances, ideally 30 days or older.
There is a purge script that comes out of the box with IBM BPM to purge all completed process and task instances from the BPM database. The details are given in this link.
This is the proposed archival strategy for IBM BPM. However, this can be customized based on the firm’s business and IT needs.
About the Author
Parthasarathi Jayapathi is a Solution Architect at Prolifics specializing in BPM, SOA and Java/JEE technologies. He has over 11 years of consulting experience in the IT industry in domains including Healthcare, Banking and Media/Entertainment. Partha has extensive experience across different project lifecycle phases such as architecture, design, development, deployment and production support.