CPE Object Store is not optimized documents with lots and lots of versions

See this idea on ideas.ibm.com

Customer has versions series with 90,000 superseded versions and another case with 250,000. The objective is to delete all versions except the current version (want to delete 249,999 obsolete versions). Below is the diagnosis of the customer that gets down to the CPE design. Spoke with CPE - Development via Collaboration 137253 and it was suggested to open a Collaboration

I'd like to offer some additional analysis for the CPE Development team. I've attached a javascript test harness that creates documents, disables multiple subsystems, enables DB trace, deletes versions, disables DB trace, re-enables subsystems that were running prior to test. It's run from ACCE Object store query bulk actions javascript for Folder instance where parent is null - so it runs once. The script generates 5 non-content versions and then can either run a specific version deletion or a complete version series deletion. It shuts down subsystems that might generate DB activity, though the server seems to take a while to deal with that so there are some entries in the logs not specifically for the version delete process.

Each version delete regardless of it's status (reservation, in process, released, superseded) does a lot of checking on other tables individually and sequentially to see if there are hold relationships that would prevent the deletion, and other objects that would also need deleting (thumbnails, annotations, filing relationships, security ids). We understand the complexity involved as many of the tables don't have version series id as a column so each version is individually validated. This process doesn't use any of the features of the database and relies on sequential queries. Many of those checks could be done more efficiently with joins in a single query, as they're only checking for existence and then doing additional work if true. In our case, none of the tests will find any objects. They're moderately quick so we're ignoring this, even though we think it could be done more efficiently in the database rather than as hundreds of thousands of queries when you have thousands of versions.

We've found that doing a VersionSeries delete enumerates every version and then processes each version discretely, so there's no advantage in doing a complete document delete. This makes out problem worse.

The true issue, is that the code is doing a WITH UPDATE lock on every version regardless of what it is doing. It locks the version, then performs all the consistency checks for that version. Prior to then doing anything, it locks every version by doing a WITH UPDATE lock on each row. This is excessive.

If there's some blocking behaviour needed, then you need to block the head of the entire version series, not each entry. Blocking the reservation/in process/released versions rather than every version would achieve the same result. With 5 versions, as generated in the test harness, you'd lock 1 version, the head, and then do the delete. With 5,000 versions with a checkout and major and minor versions, you'd lock 3 versions, the reservation, the in process and the released. Superseded are irrelevant for the block. No-one could operate on any of the individual versions as you'd effectively blocked the document being changed. This is one change to the SQL rather than a huge re-engineering process. Instead of doing where version_series_id = ? WITH UPDATE LOCK you need to add where (version_status in (1, 2, 3) or object_id = ? or more simply just add where version_status <> 4. The object id is the specific version you want to stop. The rest stops anyone touching other versions as they all do the same thing. Rather than locking 90,000 rows, you would lock at most 4. The rest of the logic remains the same, with all of the pre-requisite checking. The code already acquires a lock, and deletes are row locking in all the supported databases anyway.

This change would alleviate the deadlock issue as well as the onerous database activity. It doesn't change the consistency of the system, as it only locks the things that matter, rather than everything. It achieves the same result as you always lock the top versions which locks all activity on the lower ones, without locking all of them.

Idea priority

Medium

Post comment

By clicking the "Post Comment" or "Submit Idea" button, you are agreeing to the IBM Ideas Portal Terms of Use.
Do not place IBM confidential, company confidential, or personal information into any field.

Shape the future of IBM!

Search existing ideas

Post your ideas

Please use the following category to raise ideas for these offerings for all environments (traditional on premises, containers, on cloud):

Specific links you will want to bookmark for future use

CPE Object Store is not optimized documents with lots and lots of versions

Please enter your email address

RELATED IDEAS

CPE Object Store is not optimized documents with lots and lots of versions