HXENG-64 refactor ATS (#657)

Refactor to clean up packages in the t-model and to introduce a simpler to implement t-engine base. The new t-engines (tika, imagemagick, libreoffice, pdfrenderer, misc, aio, aspose) and t-router may be used in combination with older components as the API between the content Repo and between components has not changed. As far as possible the same artifacts are created (the -boot projects no longer exist). They may be used with older ACS repo versions. The main changes to look for are: * The introduction of TransformEngine and CustomTransformer interfaces to be implemented. * The removal in t-engines and t-router of the Controller, Application, test template page, Controller tests and application config, as this is all now done by the t-engine base package. * The t-router now extends the t-engine base, which also reduced the amount of duplicate code. * The t-engine base provides the test page, which includes drop downs of known transform options. The t-router is able to use pipeline and failover transformers. This was not possible to do previously as the router had no test UI. * Resources including licenses are automatically included in the all-in-one t-engine, from the individual t-engines. They just need to be added as dependencies in the pom. * The ugly code in the all-in-one t-engine and misc t-engine to pick transformers has gone, as they are now just selected by the transformRegistry. * The way t-engines respond to http or message queue transform requests has been combined (eliminates the similar but different code that existed before). * The t-engine base now uses InputStream and OutputStream rather than Files by default. As a result it will be simpler to avoid writing content to a temporary location. * A number of the Tika and Misc CustomTransforms no longer use Files. * The original t-engine base still exists so customers can continue to create custom t-engines the way they have done previously. the project has just been moved into a folder called deprecated. * The folder structure has changed. The long "alfresco-transform-..." names have given way to shorter easier to read and type names. * The t-engine project structure now has a single project rather than two. * The previous config values still exist, but there are now a new set for config values for in files with names that don't misleadingly imply they only contain pipeline of routing information. * The concept of 'routing' has much less emphasis in class names as the code just uses the transformRegistry. * TransformerConfig may now be read as json or yaml. The restrictions about what could be specified in yaml has gone. * T-engines and t-router may use transform config from files. Previously it was just the t-router. * The POC code to do with graphs of possible routes has been removed. * All master branch changes have been merged in. * The concept of a single transform request which results in multiple responses (e.g. images from a video) has been added to the core processing of requests in the t-engine base. * Many SonarCloud linter fixes.
2025-09-17 14:21:18 +00:00 · 2022-09-14 13:40:19 +01:00
parent ea83ef9ebc
commit babe26b0ba
652 changed files with 19479 additions and 18195 deletions
--- a/docs/alfresco-transformer.yaml
+++ b/docs/alfresco-transformer.yaml
@@ -1,224 +0,0 @@
-swagger: '2.0'
-info:
-  description: |
-    **Alfresco Transform Engines REST API**
-
-    Transform Request & Response API to allow a source file to be transformed into a 
-    target file, given a set of transform options. 
-    
-    The new JSON-based Transform Engines API is used by the Alfresco Transform Service (ATS).
-    ATS provides an independently-scalable transform service, initially used by ACS 
-    Content Repository, as part of the overall Alfresco Digital Business Platform (DBP).
-    
-    Note: Each kind of Transform Engine implements this Transform Engines API, including:
-    
-    * ImageMagick
-    * LibreOffice
-    * PDF Renderer
-    * Tika
-    
-    In the future, this Transform Engines API may form the basis for adding custom Transform Engines.
-    
-  version: '1'
-  title: Alfresco Transform Engines REST API
-basePath: /alfresco/api/-default-/private/transformer/versions/1
-tags:
-  - name: Transform
-    description: Transform Engine Request / Respone
-paths:
-  '/transform':
-    post:
-      x-alfresco-since: "2.0"
-      tags:
-        - Transform
-      summary: Transform Engines API
-      description: |
-        **Note:** available with Alfresco Transform Engines 2.0 and newer versions.
-        
-        This endpoint supports both JSON and Multipart. The JSON API is used within the 
-        Alfresco Transform Service (eg. ACS 6.1). The Multipart API remains for backwards 
-        compatibility (eg. ACS 6.0).
-        
-        **Using JSON (application/json -> application/json)**
-        
-        The ACS Content Repository 6.1 (or higher) provides the option to offload 
-        supported transformations to the Alfresco Transform Service.
-         
-        The JSON API is used within the Alfresco Transform Service. It relies on the 
-        source and target files being stored and retrieved via the Alfresco Shared File 
-        Store (see also [alfresco-sfs.yaml](https://github.com/Alfresco/alfresco-shared-file-store/blob/master/docs/api-definitions/alfresco-sfs.yaml)).       
-        
-        Here's a pseudo-example transform request:
-        
-        ```JSON
-        {
-          "schema": 1,
-          "requestId": "0aead31c-e3ca-42c9-8e16-c1938ff64c3a",
-          "clientData": "opaque-client-specific-data-123xyz",
-          "sourceReference": "598387b8-d85d-4557-816e-50f44c969e04",
-          "sourceSize": 32713,
-          "sourceMediaType: "image/jpeg",
-          "sourceExtension": "jpeg",
-          "targetMediaType: "image/png",
-          "targetExtension": "png",
-          "transformRequestOptions": {
-            "resizeWidth": "25",
-            "resizePercentage": "true",
-            "maintainAspectRatio": "true"
-          }
-        }
-        ```
-        
-        Here's a pseudo-example response of a successful transform:
-        
-        ```JSON
-        {
-          "schema": 1,
-          "status": 201
-          "requestId": "0aead31c-e3ca-42c9-8e16-c1938ff64c3a",
-          "clientData": "opaque-client-specific-data-123xyz",
-          "sourceReference": "598387b8-d85d-4557-816e-50f44c969e04",
-          "targetReference": "5bc81e48-e17a-4727-bd1c-3a279aa6b421"
-        }
-        ```
-        
-        Here's a pseudo-example response of a failed transform:
-        
-        ```JSON
-        {
-          "schema": 1,
-          "status": 400,
-          "errorDetails": "Lorem ipsum dolor sit amet, ..."
-          "requestId": "0aead31c-e3ca-42c9-8e16-c1938ff64c3a",
-          "clientData": "opaque-client-specific-data-123xyz",
-          "sourceReference": "598387b8-d85d-4557-816e-50f44c969e04"
-        }
-        ```
-        
-        **Using Multipart (multipart/form-data -> application/octet-stream)**
-        
-        The Multipart API remains for backwards compatibility (eg. ACS 6.0). It requires 
-        the source file to be uploaded via multipart/form-data (along with transformation 
-        options). The target file is returned as a binary response (application/octet-steam).
-        
-      operationId: transformOperation
-      parameters:
-        - in: body
-          name: transformRequest
-          description: The Transform Request including source reference and transform options 
-          required: true
-          schema:
-            $ref: '#/definitions/transformRequest'      
-      consumes:
-        - application/json
-        - multipart/form-data
-      produces:
-        - application/json
-        - application/octet-stream
-      responses:
-        '201':
-          description: Successful response
-          schema:
-            $ref: '#/definitions/transformReply'
-        default:
-          description: Unexpected error
-          schema:
-            $ref: '#/definitions/Error'
-  '/transformer/options':
-    get:
-      tags:
-      - Transform
-      description: List transform options
-      operationId: transformOptions
-      produces:
-      - application/json
-      responses:
-        200:
-          description: Successful response
-          schema:
-            type: array
-            xml:
-              name: transformOptions
-              wrapped: true
-            items:
-              $ref: '#/definitions/transformOption'
-definitions:
-  Error:
-    type: object
-    required:
-      - error
-    properties:
-      error:
-        type: object
-        required:
-          - statusCode
-          - briefSummary
-          - stackTrace
-          - descriptionURL
-        properties:
-          errorKey:
-            type: string
-          statusCode:
-            type: integer
-            format: int32
-          briefSummary:
-            type: string
-          stackTrace:
-            type: string
-          descriptionURL:
-            type: string
-          logId:
-            type: string
-  transformRequest:
-    type: object
-    properties:
-      requestId:
-        type: string
-      sourceReference:
-        type: string
-      sourceMediaType:
-        type: string
-      sourceSize:
-        type: integer
-        format: int64
-      sourceExtension:
-        type: string
-      targetMediaType:
-        type: string
-      targetExtension:
-        type: string
-      clientData:
-        type: string
-      schema:
-        type: integer
-      transformRequestOptions:
-        type: object
-        additionalProperties:
-          type: string        
-  transformReply:
-    type: object
-    properties:
-      status:
-        type: integer
-      requestId:
-        type: string
-      sourceReference:
-        type: string
-      targetReference:
-        type: string
-      clientData:
-        type: string
-      schema:
-        type: integer
-      errorDetails:
-        type: string
-  transformOption:
-    type: object
-    required:
-    - required
-    - name
-    properties:
-      required:
-        type: boolean
-      name:
-        type: string
--- a/docs/engine_config.md
+++ b/docs/engine_config.md
@@ -1,168 +0,0 @@
-## T-Engine configuration
-
-T-Engines provide a */transform/config* end point for clients (e.g. Transform-Router or 
-Alfresco-Repository) that indicate what is supported. T-Engines store this 
-configuration as a JSON resource file named *engine_config.json*.
-
-The config can be found under `alfresco-transform-core\<t-engine-name>\src\main\resources
-\engine_config.json`; current configuration files are:
-* [Pdf-Renderer T-Engine configuration](https://github.com/Alfresco/alfresco-transform-core/blob/master/alfresco-docker-alfresco-pdf-renderer/src/main/resources/engine_config.json).
-* [ImageMagick T-Engine configuration](https://github.com/Alfresco/alfresco-transform-core/blob/master/alfresco-docker-imagemagick/src/main/resources/engine_config.json).
-* [Libreoffice T-Engine configuration](https://github.com/Alfresco/alfresco-transform-core/blob/master/alfresco-docker-libreoffice/src/main/resources/engine_config.json).
-* [Tika T-Engine configuration](https://github.com/Alfresco/alfresco-transform-core/blob/master/alfresco-docker-tika/src/main/resources/engine_config.json).
-* [Misc T-Engine configuration](https://github.com/Alfresco/alfresco-transform-core/blob/master/alfresco-docker-transform-misc/src/main/resources/engine_config.json).
-
-*Snippet from Tika T-engine configuration:*
-```json
-{
-  "transformOptions": {
-    "tikaOptions": [
-      {"value": {"name": "targetEncoding"}}
-    ],
-    "pdfboxOptions": [
-      {"value": {"name": "notExtractBookmarksText"}},
-      {"value": {"name": "targetEncoding"}}
-    ]
-  },
-  "transformers": [
-    {
-      "transformerName": "PdfBox",
-      "supportedSourceAndTargetList": [
-        {"sourceMediaType": "application/pdf",                                 "targetMediaType": "text/html"},
-        {"sourceMediaType": "application/pdf", "maxSourceSizeBytes": 26214400, "targetMediaType": "text/plain"}
-      ],
-      "transformOptions": [
-        "pdfboxOptions"
-      ]
-    },
-    {
-      "transformerName": "TikaAuto",
-      "supportedSourceAndTargetList": [
-        {"sourceMediaType": "application/msword",              "priority": 55, "targetMediaType": "text/xml"}
-      ],
-      "transformOptions": [
-        "tikaOptions"
-      ]
-    },
-    {
-      "transformerName": "TextMining",
-      "supportedSourceAndTargetList": [
-        {"sourceMediaType": "application/msword",                              "targetMediaType": "text/xml"}
-      ],
-      "transformOptions": [
-        "tikaOptions"
-      ]
-    }
-  ]
-}
-```
-
-### Transform Options
-*  **transformOptions** provides a list of transform options that may be
-  referenced for use in different transformers. This way common options
-  don't need to be repeated for each transformer, they can be shared between
-  T-Engines. In this example there are two groups of options called **tikaOptions**
-  and **pdfboxOptions** which has a group of options **targetEncoding** and
-  **notExtractBookmarksText**. Unless an option has a **"required": true** field it is
-  considered to be optional. You don't need to specify *sourceMimetype*,
-  *targetMimetype*, *sourceExtension* or *targetExtension* as options as 
-  these are automatically added.
-  
-  *Snippet from ImageMagick T-engine configuration:*
-```json
-    "transformOptions": {
-      "imageMagickOptions": [
-        {"value": {"name": "alphaRemove"}},
-        {"group": {"transformOptions": [
-          {"value": {"name": "cropGravity"}},
-          {"value": {"name": "cropWidth"}},
-          {"value": {"name": "cropHeight"}},
-          {"value": {"name": "cropPercentage"}},
-          {"value": {"name": "cropXOffset"}},
-          {"value": {"name": "cropYOffset"}}
-        ]}},
-      ]
-    },
-```
-*  There are two types of transformOptions, *transformOptionsValue* and *transformOptionsGroup*:
-   *  _TransformOptionsValue_ is used to represent a single transformation option, it is defined 
-   by a **name** and an optional **required** field.
-   *  _TransformOptionGroup_ represents a group of one or more options, it is used to group 
-   options that define a
-   characteristic. In the above snippet all the options for crop are defined under a group, it is recommended to
-   use this approach as it is easier to read. A transformOptionsGroup can contain one or more transformOptionsValue 
-   and transformOptionsGroup. 
-  
-  **Limitations**:
-  * For a transformOptions to be referenced in a different T-engine, another transformer
-  with the complete definition of the transformOptions needs to return the config to the client.
-  * In a transformOptions definition it is not allowed to use a reference to another tranformOption.
-  
-### Transformers
-* **transformers** - A list of transformer definitions.
-  Each transformer definition should have a unique **transformerName**,
-  specify a **supportedSourceAndTargetList** and indicate which
-  options it supports. As it is shown in the Tika snippet, an *engine_config*
-  can describe one or more transformers, as a T-engine can have
-  multiple transformers (e.g. Tika, Misc). A transformer configuration may 
-  specify references to 0 or more transformOptions.
-
-### Supported Source and Target List
-* **supportedSourceAndTargetList** is simply a list of source and target
-  Media Types that may be transformed, optionally specifying a
-  **maxSourceSizeBytes** and a **priority** value. 
-*  *maxSourceSizeBytes* is used to set the upper size limit of a transformation.
-   * If not specified, the default value for maxSourceSizeBytes is **unlimited**.
-*  *priority* it is used by clients to determine which transfomer to call or by T-engines
-    with multiple transformers to determine which one to use. In the above Tika snippet,
-    both *TikaAuto* and *TextMining* have the capability to transform *"application/msword"*
-    into *"text/xml"*, the transformer containing the source-target media type with higher priority will be chosen by the
-    T-engine as the one to execute the transformation, in this case it will be *TextMining*, because:
-   * If not specified, the default value for priority is **50**.
-   * Note: priority values are like the order in a queue, the **lower** the number the **higher the
-    priority** is.
-   
-## Transformer selection strategy
-The ACS repository will use the T-Engine configuration to choose which T-Engine will perform a transform.
-A transformer definition contains a supported list of source and target Media Types. This is used for the
-most basic selection. This is further refined by checking that the definition also supports transform options
-(parameters) that have been supplied in a transform request or a Rendition Definition used in a rendition request.
-Order for selection is:
-1. Source->Target Media Types
-2. transformOptions
-3. maxSourceSizeBytes
-4. priority
- 
-#### Case 1:
-```
-Transformer 1 defines options: Op1, Op2
-Transformer 2 defines options: Op1, Op2, Op3, Op4
-```
-```
-Rendition provides values for options: Op2, Op3
-```
-If we assume both transformers support the required source and target Media Types, Transformer 2 will be selected
-because it knows about all the supplied options. The definition may also specify that some options are required or grouped.
-
-#### Case 2:
-```
-Transformer 1 defines options: Op1, Op2, maxSize
-Transformer 2 defines options: Op1, Op2, Op3
-```
-```
-Rendition provides values for options: Op1, Op2
-```
-If we assume both transformers support the required source and target Media Types, and file size is greater than *maxSize*
-,Transformer 2 will be selected because if can handle *maxSourceSizeBytes* for this transformation.
-
-#### Case 3:
-```
-Transformer 1 defines options: Op1, Op2, priorty1
-Transformer 2 defines options: Op1, Op2, Op3, priority2
-```
-```
-Rendition provides values for options: Op1, Op2
-```
-If we assume both transformers support the required source and target Media Types and
- *priority1* < *priority2*, Transformer 1 will be selected because its priority is higher.
- 
--- a/docs/t-engines.md
+++ b/docs/t-engines.md
@@ -0,0 +1,60 @@
+# T-Engines
+
+The t-engines provide the basic transform operations. The Transform Service
+provides a common base for the communication with other components. It is
+this base that is described in this section. The base is a Spring Boot
+application to which transform specific code is added and then wrapped
+in a Docker image with any programs that the transforms need. The base
+does not need to be used as long as there appears to be a process responding
+endpoints and messages.
+
+A t-engine groups together one of more Transformers. Each Transformer
+(provided by transform specific code) knows how to perform a set of
+transformations from one MIME Type to another with a common set of
+t-options.
+
+~~~yaml
+0010 my-t-engine
+  Transformer 1
+    mimetype A -> mimetype B
+    mimetype A -> mimetype C
+    mimetype B -> mimetype C
+    option1
+    option2
+  Transformer 2
+    mimetype A -> mimetype B
+    mimetype D -> mimetype C
+    option2
+    option3
+0020 another-t-engine
+  ...
+0030 yet-another-t-engine
+  ...
+~~~
+
+## Endpoints
+
+* `POST /transform` to perform a transform. There are two forms:
+  * For asynchronous transforms: Perform a transform using a
+    `TransformRequest` received from the t-router via a message queue. The
+    `TransformReply` is sent back via the queue.
+  * For synchronous transforms: Performs a transform on content uploaded as
+    a Multipart File and provides the resulting content as a download.
+    Transform options are extracted from the request properties. The
+    following are not added as transform options, but are used to select the
+    transformer: `sourceMimetype` & `targetMimetype`.
+* `GET /transform/config` to obtain t-config about what the t-engine supports.
+  It has a parameter `configVersion` to allow a caller and the t-engine to
+  negotiate down to a common format. The value is an integer which indicate
+  which elements may to be added to the config. These elements reflect
+  functionality supported by the base (such as pre-signed URLs). The
+  `CoreVersionDecorator` adds to the Config returned by the transform
+  specific code.
+* `GET /` provides an html test page to upload a source file, enter transform
+  options and issue a synchronous transform request. Useful in testing.
+* `GET /log` provides a page with basic log information. Useful in testing.
+* `GET /error` provides an error page when testing.
+* `GET /version` provides a String message to be included in client debug
+  messages.
+* `GET /ready` used by Kubernetes as a ready probe.
+* `GET /live` used by Kubernetes as a ready probe.
--- a/docs/transform-config.md
+++ b/docs/transform-config.md
@@ -0,0 +1,310 @@
+# T-Engine configuration
+
+Each t-engine provides an endpoint that returns t-config that defines what
+it supports. The t-router and t-engines may also have external t-config files.
+These are combined in name order. As sorting is alphanumeric, you may wish to
+consider using a fixed length numeric prefix in filenames and t-engine names. As will be seen
+t-config may reference elements from other components or modify elements
+from earlier t-config.
+
+Current configuration files are:
+* [Pdf-Renderer T-Engine configuration](https://github.com/Alfresco/alfresco-transform-core/blob/master/engines/pdfrenderer/src/main/resources/pdfrenderer_engine_config.json).
+* [ImageMagick T-Engine configuration](https://github.com/Alfresco/alfresco-transform-core/blob/master/engines/imagemagick/src/main/resources/imagemagick_engine_config.json).
+* [Libreoffice T-Engine configuration](https://github.com/Alfresco/alfresco-transform-core/blob/master/engines/libreoffice/src/main/resources/libreoffice_engine_config.json).
+* [Tika T-Engine configuration](https://github.com/Alfresco/alfresco-transform-core/blob/master/engines/tika/src/main/resources/tika_engine_config.json).
+* [Misc T-Engine configuration](https://github.com/Alfresco/alfresco-transform-core/blob/master/engines/misc/src/main/resources/misc_engine_config.json).
+
+Additional config files (which may be resources on the classpath or external
+files) are specified in Spring Boot properties or such as
+`transform.config.file.<filename>` or environment variables like
+`TRANSFORM_CONFIG_FILE_<filename>`.
+
+The following is a simple t-config file from an example Hello World
+t-engine.
+
+~~~json
+{
+  "transformOptions":
+  {
+    "helloWorldOptions":
+    [
+      {"value": {"name": "language"}}
+    ]
+  },
+  "transformers":
+  [
+    {
+      "transformerName": "helloWorld",
+      "supportedSourceAndTargetList":
+      [
+        {"sourceMediaType": "text/plain",  "maxSourceSizeBytes": 50, "targetMediaType": "text/html"  }
+      ],
+      "transformOptions":
+      [
+        "helloWorldOptions"
+      ]
+    }
+  ]
+}
+~~~
+
+* **transformOptions** provides a list of transform options (each with its own
+  name) that may be referenced for use in different transformers. This way
+  common options don't need to be repeated for each transformer. They can
+  even be shared between T-Engines. In this example there is only one group
+  of options called `helloWorldOptions`, which has just one option the
+  `language`. Unless an option has a `"required": true` field it is considered
+  to be optional. You don't need to specify _sourceMimetype, sourceExtension,
+  sourceEncoding, targetMimetype, targetExtension_ or _timeout_ as options as
+  these are available to all transformers.
+* **transformers** is a list of transformer definitions. Each transformer
+  definition should have a unique `transformerName`, specify a
+  `supportedSourceAndTargetList` and indicate which options it supports.
+  In this case there is only one transformer called `Hello World` and it
+  accepts `helloWorldOptions`. A transformer may specify references to 0
+  or more transformOptions.
+* **supportedSourceAndTargetList** is simply a list of source and target
+  Media Types that may be transformed, optionally specifying
+  `maxSourceSizeBytes` and `priority` values. In this case there is only one
+  from text to HTML and we have limited the source file size, to avoid
+  transforming files that clearly don't contain names.
+
+## Transform pipelines
+
+Transforms may be combined in a pipeline to form a new transformer, where
+the output from one becomes the input to the next and so on. The t-config
+defines the sequence of transform steps and intermediate Media Types. Like
+any other transformer, it specifies a list of supported source and target
+Media Types. If you don't supply any, all possible combinations are assumed
+to be available. The definition may reuse the `transformOptions` of
+transformers in the pipeline, but typically will define its own subset
+of these.
+
+The following example begins with the `helloWorld` Transformer, which takes a
+text file containing a name and produces an HTML file with `Hello <name>`
+message in the body. This is then transformed back into a text file. This
+example contains just one pipeline transformer, but many may be defined 
+in the same file.
+
+~~~json
+{
+  "transformers": [
+    {
+      "transformerName": "helloWorldText",
+      "transformerPipeline" : [
+        {"transformerName": "helloWorld", "targetMediaType": "text/html"},
+        {"transformerName": "html"}
+      ],
+      "supportedSourceAndTargetList": [
+        {"sourceMediaType": "text/plain", "priority": 45,  "targetMediaType": "text/plain" }
+      ],
+      "transformOptions": [
+        "helloWorldOptions"
+      ]
+    }
+  ]
+}
+~~~
+
+* **transformerName** Try to create a unique name for the transform.
+* **transformerPipeline** A list of transformers in the pipeline. The
+  `targetMediaType` specifies the intermediate Media Types between
+  transformers. There is no final `targetMediaType` as this comes from the
+  `supportedSourceAndTargetList`. The `transformerName` may reference a
+  transformer that has not been defined yet. A warning is issued if
+  it remains undefined after all t-config has been combined. Generally
+  it is better for a t-engine rather than the t-router to define pipeline
+  transformers as this limits the number of places that have to be changed.
+  Normally it is obvious which t-engine should contain the definition. 
+* **supportedSourceAndTargetList** The supported source and target Media
+  Types, which refer to the Media Types this pipeline transformer can
+  transform from and to, additionally you can set the `priority` and the
+  `maxSourceSizeBytes`. If blank, this indicates that all possible
+  combinations are supported. This is the cartesian product of all source
+  types to the first intermediate type and all target types from the last
+  intermediate type. Any combinations supported by the first transformer
+  are excluded. They will also have the priority from the first transform.
+* **transformOptions** A list of references to options required by the
+  pipeline transformer.
+
+## Failover transforms
+
+A failover transform, simply provides a list of transforms to be attempted
+one after another until one succeeds. For example, you may have a fast
+transform that is able to handle a limited set of transforms and another
+that is slower but handles all cases.
+
+~~~json
+{
+  "transformers": [
+    {
+      "transformerName": "imgExtractOrImgCreate",
+      "transformerFailover" : [ "imgExtract", "imgCreate" ],
+      "supportedSourceAndTargetList": [
+        {"sourceMediaType": "application/vnd.oasis.opendocument.graphics", "priority": 150, "targetMediaType": "image/png" },
+        ...
+        {"sourceMediaType": "application/vnd.sun.xml.calc.template",       "priority": 150, "targetMediaType": "image/png" }
+      ]
+    }
+  ]
+}
+~~~
+
+* **transformerName** Try to create a unique name for the transform.
+* **transformerFaillover** A list of transformers to try. This may include
+  references to transformer that have not been defined yet. Generally it
+  is better for the t-engine rather than the t-router to define failover
+  transformers as this limits the number of places that have to be changed.
+  Normally it is obvious which t-engine should contain the definition. 
+* **supportedSourceAndTargetList** The supported source and target Media
+  Types, which refer to the Media Types this failover transformer can
+  transform from and to, additionally you can set the `priority` and the
+  `maxSourceSizeBytes`. Unlike pipelines, it must not be blank.
+* **transformOptions** A list of references to options required by the 
+  pipeline transformer.
+
+## Overriding transforms
+
+It is possible to override a previously defined transform definition. The
+following example removes most of the supported source to target media
+types from the standard `"libreoffice"` transform. It also changes the
+max size and priority of others. This is not something you would normally
+want to do.
+
+~~~json
+{
+  "transformers": [
+    {
+      "transformerName": "libreoffice",
+      "supportedSourceAndTargetList": [
+        {"sourceMediaType": "text/csv", "maxSourceSizeBytes": 1000, "targetMediaType": "text/html" },
+        {"sourceMediaType": "text/csv", "targetMediaType": "application/vnd.oasis.opendocument.spreadsheet" },
+        {"sourceMediaType": "text/csv", "targetMediaType": "application/vnd.oasis.opendocument.spreadsheet-template" },
+        {"sourceMediaType": "text/csv", "targetMediaType": "text/tab-separated-values" },
+        {"sourceMediaType": "text/csv", "priority": 45, "targetMediaType": "application/vnd.ms-excel" },
+        {"sourceMediaType": "text/csv", "priority": 155, "targetMediaType": "application/pdf" }
+      ]
+    }
+  ]
+}
+~~~
+
+## Removing a transformer
+
+To discard a previous transformer definition include its name in the
+optional `"removeTransformers"` list. You might want to do this if you
+have a replacement and wish keep the overall configuration simple (so it
+contains no alternatives), or you wish to temporarily remove it. The
+following example removes two transformers before processing any other
+configuration in the same T-Engine or pipeline file.
+
+~~~json
+{
+  "removeTransformers" : [
+    "libreoffice",
+    "Archive"
+   ]
+  ...
+}
+~~~
+
+## Overriding the supportedSourceAndTargetList
+
+Rather than totally override an existing transform definition, it is
+generally simpler to modify the `"supportedSourceAndTargetList"` by adding
+elements to the optional `"addSupported"`, `"removeSupported"` and
+`"overrideSupported"` lists. You will need to specify the
+`"transformerName"` but you will not need to repeat all the other
+`"supportedSourceAndTargetList"` values, which means if there are changes
+in the original, the same change is not needed in a second place. The
+following example adds one transform, removes two others and changes
+the `"priority"` and `"maxSourceSizeBytes"` of another. This is done before
+processing any other configuration in the same T-Engine or pipeline file.
+
+~~~json
+{
+  "addSupported": [
+    {
+      "transformerName": "Archive",
+      "sourceMediaType": "application/zip",
+      "targetMediaType": "text/csv",
+      "priority": 60,
+      "maxSourceSizeBytes": 18874368
+    }
+  ],
+  "removeSupported": [
+    {
+      "transformerName": "Archive",
+      "sourceMediaType": "application/zip",
+      "targetMediaType": "text/xml"
+    },
+    {
+      "transformerName": "Archive",
+      "sourceMediaType": "application/zip",
+      "targetMediaType": "text/plain"
+    }
+  ],
+  "overrideSupported": [
+    {
+      "transformerName": "Archive",
+      "sourceMediaType": "application/zip",
+      "targetMediaType": "text/html",
+      "priority": 60,
+      "maxSourceSizeBytes": 18874368
+    }
+  ]
+  ...
+}
+~~~
+
+## Default maxSourceSizeBytes and priority values
+
+When defining `"supportedSourceAndTargetList"` elements the `"priority"`
+and `"maxSourceSizeBytes"` are optional and normally have the default
+values of 50 and -1 (no limit). It is possible to change those defaults.
+In precedence order from most specific to most general these are defined
+by combinations of `"transformerName"` and `"sourceMediaType"`.
+
+* **transformer and source media type default** both specified
+* **transformer** default only the transformer name is specified
+* **source media type default** only the source media type is specified
+* **system wide default** neither are specified.
+
+Both `"priority"` and `"maxSourceSizeBytes"` may be specified in an element,
+but if only one is specified it is only that value that is being defaulted.
+
+Being able to change the defaults is particularly useful once a T-Engine
+has been developed as it allows a system administrator to handle
+limitations that are only found later. The `system wide defaults` are
+generally not used but are included for completeness. The following
+example says that the `"Office"` transformer by default should only handle 
+zip files up to 18 Mb and by default the maximum size of a `.doc` file to be
+transformed is 4 Mb. The third example defaults the priority, possibly
+allowing another transformer that has specified a priority of say `50` to
+be used in preference.
+
+Defaults values are only applied after all t-config has been read.
+
+~~~json
+{
+  "supportedDefaults": [
+    {
+      "transformerName": "Office",             // default for a source type within a transformer
+      "sourceMediaType": "application/zip",
+      "maxSourceSizeBytes": 18874368
+    },
+    {
+      "sourceMediaType": "application/msword", // defaults for a source type
+      "maxSourceSizeBytes": 4194304,
+      "priority": 45
+    },
+    {
+      "priority": 60                           // system wide default
+    },
+    {
+      "maxSourceSizeBytes": -1                 // system wide default
+    }
+  ]
+  ...
+}
+~~~
--- a/docs/transform-specific-code.md
+++ b/docs/transform-specific-code.md
@@ -0,0 +1,140 @@
+# Transform specific code
+
+To create a new t-engine an author uses a base t-engine (a Spring Boot
+application) and implements the following interfaces. An implementation of
+the `CustomTransformer` provides the actual transformation code and the
+implementation of the `TransformEngine` says what it is capable of
+transforming. The `TransformConfig` is normally read from a json file on the
+classpath. Multiple `CustomTransformer` implementations may be in a singe
+t-engine. As a result the author can concentrate on the code that transforms
+one format to another without really worrying about all the plumbing.
+Typically, the transform specific code uses a 3rd party library or an
+external executable which needs to be added to the Docker image.
+
+~~~java
+package org.alfresco.transform;
+
+import org.alfresco.transform.config.TransformConfig;
+import org.alfresco.transformer.probes.ProbeTestTransform;
+
+import java.util.Set;
+
+/**
+ * Interface to be implemented by transform specific code. Provides information
+ * about the t-engine as a whole. So that it is automatically picked up, it must
+ * exist in a package under {@code org.alfresco.transform} and have the Spring
+ * {@code @Component} annotation.
+ */
+public interface TransformEngine
+{
+    /**
+      * @return the name of the t-engine. The t-router reads config from t-engines
+      *         in name order.
+      */
+    String getTransformEngineName();
+
+    /**
+     * @return a definition of what the t-engine supports. Normally read from a json
+     *         Resource on the classpath.
+     */
+    TransformConfig getTransformConfig();
+
+    /**
+     * @return a ProbeTestTransform (will do a quick transform) for k8 liveness and
+     *         readiness probes.
+     */
+    ProbeTransform getProbeTransform();
+}
+~~~
+
+implementations of the following interface provide the actual transform code.
+
+~~~java
+package org.alfresco.transform;
+
+import java.io.InputStream;
+import java.io.OutputStream;
+import java.util.Map;
+
+/**
+ * Interface to be implemented by transform specific code. The
+ * {@code transformerName} should match the transformerName in the
+ * {@link TransformConfig} returned by the {@link TransformEngine}. So that it is
+ * automatically picked up, it must exist in a package under
+ * {@code org.alfresco.transform} and have the Spring {@code @Component} annotation.
+ *
+ * Implementations may also use the {@link TransformManager} if they wish to
+ * interact with the base t-engine.
+ */
+public interface CustomTransformer
+{
+    String getTransformerName();
+
+    void transform(String sourceMimetype, InputStream inputStream,
+                   String targetMimetype, OutputStream outputStream,
+                   Map<String, String> transformOptions,
+                   TransformManager transformManager) throws Exception;
+}
+~~~
+
+The implementation of the following interface is provided by the t-base,
+allows the `CustomTransformer` to interact with the base t-engine. The
+creation of Files is discouraged as it is better not to leave files on disk.
+
+~~~java
+package org.alfresco.transform.base;
+
+import java.io.File;
+import java.io.InputStream;
+import java.io.OutputStream;
+import java.util.Map;
+
+/**
+ * Allows {@link CustomTransformer} implementations to interact with the base
+ * t-engine.
+ */
+public interface TransformManager
+{
+    /**
+     * Allows a CustomTransformer to use a local source File rather than the
+     * supplied InputStream. To avoid creating extra files, if a File has already
+     * been created by the base t-engine, it is returned.
+     */
+    File createSourceFile();
+
+    /**
+     * Allows a CustomTransformer to use a local target File rather than the
+     * supplied OutputStream. To avoid creating extra files, if a File has already
+     * been created by the base t-engine, it is returned.
+     */
+    File createTargetFile();
+
+    /**
+     * Allows a single transform request to have multiple transform responses. For
+     * example, images from a video at different time offsets or different pages of
+     * a document. Following a call to this method a transform response is made with
+     * the data sent to the current {@code OutputStream}. If this method has been
+     * called, there will not be another response when {@link CustomTransformer#
+     * transform(String, InputStream, String, OutputStream, Map, TransformManager)}
+     * returns and any data written to the final {@code OutputStream} will be
+     * ignored.
+     * @param index    returned with the response, so that the fragment may be
+     *                 distinguished from other responses. Renditions use the index
+     *                 as an offset into elements. A {@code null} value indicates
+     *                 that there is no more output and any data sent to the current
+     *                 {@code outputStream} will be ignored.
+     * @param finished indicates this is the final fragment. {@code False} indicates
+     *                 that it is expected there will be more fragments. There need
+     *                 not be a call with this parameter set to {@code true}.
+     * @return a new {@code OutputStream} for the next fragment. A {@code null} will
+     *                 be returned if {@code index} was {@code null} or {@code
+     *                 finished} was {@code true}.
+     * @throws TransformException if a synchronous (http) request has been made as
+     *                 this only works with requests on queues, or the first call to
+     *                 this method indicated there was no output, or another call is
+     *                 made after it has been indicated that there should be no more
+     *                 fragments.
+     * @throws IOException if there was a problem sending the response.
+    OutputStream respondWithFragment(Integer index);
+}
+~~~
--- a/docs/transformer-selection.md
+++ b/docs/transformer-selection.md
@@ -0,0 +1,28 @@
+# Transformer selection strategy
+
+The TransformRegistry uses t-config to choose which Transformer will be
+used. A transformer definition contains a supported list of source and
+target Media Types. This is used for the most basic selection. It is further
+refined by checking that the definition also supports transform options (the
+parameters) that have been supplied in a transform request.
+
+~~~text
+Transformer 1 defines options: Op1, Op2
+Transformer 2 defines options: Op1, Op2, Op3, Op4
+
+Transform request provides values for options: Op2, Op3
+~~~
+
+If we assume both transformers support the required source and target Media
+Types, Transformer 2 will be selected because it knows about all the supplied
+options. The definition may also specify that some options are required or
+grouped. If any members of an optional group are supplied, all required
+members of that group become required.
+
+The configuration may impose a source file size limit resulting in the
+selection of a different transformer. Size limits are normally added to avoid
+the transforms consuming too many resources.
+
+The configuration may also specify a priority which will be used in
+Transformer selection if there are a number of possible transformers. The
+highest priority is the one with the lowest number.
--- a/docs/transformerDebug.md
+++ b/docs/transformerDebug.md
@@ -0,0 +1,46 @@
+# TransformerDebug
+
+In addition to any normal logging, the t-engines, t-router and t-client also
+use the `TransformerDebug` class to provide request based logging. The
+following is an example from Alfresco after the upload of a `docx` file.
+
+~~~text
+163               docx json AGM 2016 - Masters report.docx 14.8 KB -- metadataExtract --  TransformService
+163               workspace://SpacesStore/0db3a665-328d-4437-85ed-56b753cf19c8 1563306426
+163               docx json  14.8 KB -- metadataExtract -- PoiMetadataExtractor
+163                 cm:title=
+163                 cm:author=James Dobinson
+163               Finished in 664 ms
+...
+164               docx png  AGM 2016 - Masters report.docx 14.8 KB -- doclib --  TransformService
+164               workspace://SpacesStore/0db3a665-328d-4437-85ed-56b753cf19c8 1563306426
+164               docx png   14.8 KB -- doclib -- officeToImageViaPdf
+164.1             docx pdf   libreoffice
+164.2             pdf  png   pdfToImageViaPng
+164.2.1           pdf  png   pdfrenderer
+164.2.2           png  png   imagemagick
+164.2.2             endPage="0"
+164.2.2             resizeHeight="100"
+164.2.2             thumbnail="true"
+164.2.2             startPage="0"
+164.2.2             resizeWidth="100"
+164.2.2             autoOrient="true"
+164.2.2             allowEnlargement="false"
+164.2.2             maintainAspectRatio="true"
+164               Finished in 725 ms
+~~~
+
+This log happens to be from the t-client, but similar log lines exist in the
+t-router and individual t-engines.
+
+All lines start with a reference, which starts with the client’s request
+number (`163`, `164` if known) and then a nested pipeline or failover
+structure. The first request extracts metadata and the second creates a
+thumbnail rendition (called `doclib`). The second request is handled by a
+pipeline called `officeToImageViaPdf` which uses `libreoffice` to transform 
+to `pdf` and then another pipeline to convert to `png`. The last step
+(`164.2.2`) in the process resizes the `png` using a number of transform
+options.
+
+If requested, log information is passed back in the TransformReply's
+clientData.