Announcing Zeebe 0.22.5 and 0.23.4

By
  • Blog
  • >
  • Announcing Zeebe 0.22.5 and 0.23.4
TOPICS

30 Day Free Trial

Bring together legacy systems, RPA bots, microservices and more with Camunda

Sign Up for Camunda Content

Get the latest on Camunda features, events, top trends, and more.

TRENDING CONTENT

New patch releases for Zeebe are available now: 0.22.5 and 0.23.4 and contain various bug fixes as well as minor enhancements. You can grab the releases via the usual channels:

Zeebe 0.23.4 is fully compatible with 0.23.3, as is 0.22.5 with 0.22.4. This means it is possible to perform a rolling upgrade for your existing clusters. For Camunda Cloud users, Zeebe 0.23.4 is already the latest production version, meaning your existing clusters should have been migrated by the time you see this post.

Without further ado, here is a list of the notable changes.

Zeebe 0.22.5

The 0.22.5 patch contains 6 bug fixes, all of which are part of 0.23.4. You can read more about these in the 0.23.4 release notes below.

One important thing to note about 0.22.5: it is the last planned 0.22.x release. Once 0.24.0 is out, fixes will be applied only to the 0.23 and 0.24 branches (see our release cycle page for more).

Zeebe 0.23.4

Long polling blocked with available jobs

Long polling in Zeebe is a feature that was built for job activation. It refers to a client sending an ActivateJobsRequest to the gateway, and if after polling all partitions no jobs were available, the gateway will park that connection and wait for a notification from the broker that new jobs are available.

Previously, it could happen that connections were parked even though there were jobs available – this would result in the connection waiting unnecessarily and eventually timing out, which could have a non negligible impact on performance.

The fix here is two-fold:

  • When the gateway did not to activate any jobs, and some partitions returned RESOURCE_EXHAUSTED as an error code, it will now close the connection and propagate the error. This allows the client to back off and react to the gateway’s back pressure.
  • The gateway will now immediately enqueue requests that come in for long polling. This avoids a race condition where the broker would send a notification that more jobs were available after the initial polling, but before the connection was parked and waiting for such a notification.

Max message size not respected on write

There is a setting in Zeebe, maxMessageSize, which controls the maximum size of an entry in the replicated log. It’s used notably as an upper bound in the dispatcher – essentially a ring buffer – which is used to synchronize communication between multiple writers and a single reader.

There was a bug however where writers could write more than the maxMessageSize, but readers would only read up to that, resulting in unreadable messages. It’s now behaving as expected: it’s not possible to write more than maxMessageSize to the dispatcher.

Elasticsearch exporter bulk is not limited

To prevent sending overloading its Elasticsearch endpoint, the Elasticsearch Exporter will buffer incoming records in memory and flush them together as a batch. We noticed that when a flush operation would fail (e.g. the database was down), while the batch would be correctly retained, the exporter would keep adding more to it. In the long run, this could potentially cause an out of memory error as more and more records would be buffered.

The batch is now properly capped (1000 by default, but it’s configurable), and makes use of the built-in retry mechanism of the exporters. If it fails to flush, it will simply retry the current record with a small back off, but the in memory batch size will not grow.

Error event is not caught by boundary event on multi-instance subprocess

Error events thrown in a multi-instance subprocess were previously not caught by boundary events attached to the subprocess. This could previously result in seemingly frozen workflow instances.

This is now fixed and error events are properly caught.

Follower partitions are falsely marked as unhealthy

Some follower partitions would sometimes report they were unhealthy even if they were healthy due to a critical component which was not properly removed.

It’s now correctly de-registered and the health status of each partition is correctly set.

Incorrect health status of leader partition

A leader partition in Zeebe is the partition, among other things, the one which does the workflow processing. Previously its health status was indicative of whether or not it successfully installed all its subcomponents. This could result in a false positive, where a subcomponent would fail but the partition would still report it was healthy.

The fix was to have the partition probe its components from time to time and correct its health status, such that the it now correctly reflects its state.

Get In Touch

There are a number of ways to get in touch with the Zeebe community to ask questions and give us feedback.

We hope to hear from you!

Try All Features of Camunda

Related Content

We're streamlining Camunda product APIs, working towards a single REST API for many components, simplifying the learning curve and making installation easier.
Learn about our approach to migration from Camunda 7 to Camunda 8, and how we can help you achieve it as quickly and effectively as possible.
We've been working hard to reduce the job activation latency in Zeebe. Read on to take a peek under the hood at how we went about it and then verified success.