ACS-9699 Document horizontal and vertical scaling of T-Engines (#1101)

This commit is contained in:
Arindam Roy 2025-07-02 11:56:28 +05:30 committed by GitHub
parent 5835dabc84
commit abfa00fbcc
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

192
docs/t-engine-scaling.md Normal file
View File

@ -0,0 +1,192 @@
# T-Engine Scaling
The T-Engine can be scaled both horizontally and vertically. For either approach, we recommend keeping the `TRANSFORMER_ENGINE_PROTOCOL`
at its default value of `jms`. This setting enables the use of a JMS messaging with the ActiveMQ broker.
## Horizontal Scaling
T-Engine is intended to be run as a Docker image. Horizontal Scaling could be achieved through creating multiple Docker containers.
T-Engine relies on JMS queues, which provide built-in load balancing. This design allows you to safely run multiple instances
of the T-Engine service. Reliable messaging ensures that each message is delivered once and only once to a consumer. In point-to-point
messaging, while many consumers may listen on a queue, each message is consumed by only one instance.
### Example
```yaml
transform-core-aio:
image: quay.io/alfresco/alfresco-transform-core-aio:5.1.7
environment:
ACTIVEMQ_URL: nio://activemq:61616
FILE_STORE_URL: >-
http://shared-file-store:8099/alfresco/api/-default-/private/sfs/versions/1/file
ports:
- "8090-8091:8090" # Host ports 8090 and 8091 will be used
deploy:
replicas: 2 # Two instances of t-engine will be created
```
### Scaling of individual T-Engines
There are options to use five separate T-Engines instead of one single `all-in-one` T-Engine. These options are:
1. LibreOffice
2. ImageMagick
3. PdfRenderer
4. Tika
5. Misc
Horizontal Scaling could be achieved for these T-Engines as well - by creating multiple Docker containers for each of the T-Engines.
### Example
```yaml
libre-office:
image: quay.io/alfresco/alfresco-libreoffice:5.1.7
environment:
ACTIVEMQ_URL: nio://activemq:61616
FILE_STORE_URL: >-
http://shared-file-store:8099/alfresco/api/-default-/private/sfs/versions/1/file
ports:
- "8201-8202:8090" # Host ports 8201 and 8202 will be used
deploy:
replicas: 2 # Two instances of libre-office t-engine will be created
image-magick:
image: quay.io/alfresco/alfresco-imagemagick:5.1.7
environment:
ACTIVEMQ_URL: nio://activemq:61616
FILE_STORE_URL: >-
http://shared-file-store:8099/alfresco/api/-default-/private/sfs/versions/1/file
ports:
- "8203-8204:8090" # Host ports 8203 and 8204 will be used
deploy:
replicas: 2 # Two instances of image-magick t-engine will be created
pdf-renderer:
image: quay.io/alfresco/alfresco-pdf-renderer:5.1.7
environment:
ACTIVEMQ_URL: nio://activemq:61616
FILE_STORE_URL: >-
http://shared-file-store:8099/alfresco/api/-default-/private/sfs/versions/1/file
ports:
- "8205-8206:8090" # Host ports 8205 and 8206 will be used
deploy:
replicas: 2 # Two instances of pdf-renderer t-engine will be created
tika:
image: quay.io/alfresco/alfresco-tika:5.1.7
environment:
ACTIVEMQ_URL: nio://activemq:61616
FILE_STORE_URL: >-
http://shared-file-store:8099/alfresco/api/-default-/private/sfs/versions/1/file
ports:
- "8207-8208:8090" # Host ports 8207 and 8208 will be used
deploy:
replicas: 2 # Two instances of tika t-engine will be created
transform-misc:
image: quay.io/alfresco/alfresco-transform-misc:5.1.7
environment:
ACTIVEMQ_URL: nio://activemq:61616
FILE_STORE_URL: >-
http://shared-file-store:8099/alfresco/api/-default-/private/sfs/versions/1/file
ports:
- "8209-8210:8090" # Host ports 8209 and 8210 will be used
deploy:
replicas: 2 # Two instances of transform-misc t-engine will be created
```
### Limitations
- Alfresco Content Services (ACS) Repository can only be configured with a single T-Engine service URL.
```
localTransform.core-aio.url=http://transform-core-aio:8090/
```
- T-Router can only be configured with a single T-Engine service URL.
```
CORE_AIO_URL: http://transform-core-aio:8090
```
- Search and Search Reindexing has a dependency on a single T-Engine service URL.
```
ALFRESCO_ACCEPTED_CONTENT_MEDIA_TYPES_CACHE_BASE_URL: >-
http://transform-core-aio:8090/transform/config
```
- T-Engine depends on ActiveMQ and shared-file-store. When running multiple T-Engine instances (nodes),
same URL for ActiveMQ and shared file store must be provided to all of the T-Engine nodes.
- In Kubernetes environments, horizontal scaling is typically handled automatically via deployments and built-in load balancing.
In other environments, own load balancer is required in front of T-Engine to distribute requests.
## Vertical Scaling
Vertical scaling can be achieved through Docker, JVM, Spring Boot, or ActiveMQ configurations.
- **`mem_limit` (Docker):**
Sets the maximum amount of memory the container can use. Increasing this allows the T-Engine to handle more concurrent processing
and larger workloads.
**Default:** Not set (unlimited, but typically limited by orchestrator or host).
- **`JAVA_OPTS` (JVM):**
Sets Java Virtual Machine options. It is recommended to set JVM memory using `-XX:MinRAMPercentage` and `-XX:MaxRAMPercentage`
in combination with the container's `mem_limit` parameter. This allows the JVM to dynamically adjust its heap size based on
the memory available to the container. This is important for handling more messages or larger payloads.
**Default:** Not set (JVM uses its own default heap sizing).
- **`SPRING_ACTIVEMQ_POOL_MAXCONNECTIONS` (Spring Boot):**
Configures the maximum number of pooled connections to ActiveMQ. Increasing this value allows more simultaneous connections to
the message broker, which can improve throughput under heavy load.
**Default:** 1 set by Spring Autoconfiguration, 20 set by base engine's application.yaml
- **`JMS_LISTENER_CONCURRENCY` (Spring Boot):**
The number of concurrent sessions/consumers to start for each listener. Can either be a simple number indicating the maximum
number (e.g. "5") or a range indicating the lower as well as the upper limit (e.g. "3-5"). Note that a specified minimum is
just a hint and might be ignored at runtime. Default is 1; keep concurrency limited to 1 in case of a topic listener or if queue
ordering is important; consider raising it for general queues. Raising the upper limit allows more messages to be processed in
parallel, increasing throughput.
**Default:** 1 set by Spring Autoconfiguration, 1-10 set by base engine's application.yaml
- **`SPRING_ACTIVEMQ_BROKERURL` with `jms.prefetchPolicy.all` (Spring Boot/ActiveMQ):**
Sets the ActiveMQ broker URL. Use this property to configure the broker connection, including prefetch policy settings.
It controls how many messages are prefetched from the queue by each consumer before processing. A higher prefetch value can
improve throughput but may increase memory usage. Note that raising this number might lead to starvation of concurrent consumers!
**Default:** `tcp://localhost:61616` (prefetch policy default is 1000 for queues)
**Note:** The T-Engines consumers should be considered as slow consumers. Processing each message can take a significant amount of time.
So, the prefetch size should be correctly set (based on performance tests) to distribute the messages equally across the nodes
and consumers. The default prefetch size of 1000 messages should be decreased, otherwise a single consumer will fetch all the
messages, and it will lead to performance degradation.
### Comprehensive Example
Below is a single example that combines memory limits, JVM options, concurrency, and message prefetch settings for vertical scaling:
```yaml
transform-core-aio:
image: quay.io/alfresco/alfresco-transform-core-aio:5.1.7
mem_limit: 2048m # Sets the container memory limit
environment:
JAVA_OPTS: >- # Sets the JVM heap size on container memory limit
-XX:MinRAMPercentage=50
-XX:MaxRAMPercentage=80
ACTIVEMQ_URL: nio://activemq:61616
FILE_STORE_URL: >-
http://shared-file-store:8099/alfresco/api/-default-/private/sfs/versions/1/file
SPRING_ACTIVEMQ_BROKERURL: "nio://activemq:61616?jms.prefetchPolicy.all=100" # Decreases the message prefetch
SPRING_ACTIVEMQ_POOL_MAXCONNECTIONS: 100 # Increases the ActiveMQ connection pool
JMS_LISTENER_CONCURRENCY: 1-100 # Increases the JMS listener concurrency
ports:
- "8090:8090"
```
### Limitations
- **`mem_limit` (Docker):**
The maximum memory assigned to the container should be carefully decided. It depends on the host machines RAM size, total
memory assigned/required to the other containers (if present) in the same host machine.
- **`JAVA_OPTS` (JVM):**
JVM maximum RAM percentage to 100% of `mem_limit` is not recommended, as this can cause the JVM to use all available
container memory, leaving no room for other processes and potentially leading to container restarts due to out-of-memory (OOM)
errors.