mirror of
https://github.com/Alfresco/alfresco-transform-core.git
synced 2025-08-14 17:58:27 +00:00
Doc changes only [skip ci]
This commit is contained in:
@@ -1,161 +0,0 @@
|
|||||||
## T-Engine configuration
|
|
||||||
|
|
||||||
T-Engines provide a */transform/config* end point for clients (e.g. Transform-Router or
|
|
||||||
Repository) that indicate what is supported. T-Engines store this
|
|
||||||
configuration as a JSON resource file named *engine_config.json*.
|
|
||||||
|
|
||||||
The config can be found under `alfresco-transform-core/engines/<t-engine-name>/src/main/resources
|
|
||||||
/<t-engine-name>_engine_config.json`; current configuration files are:
|
|
||||||
* [Pdf-Renderer T-Engine configuration](https://github.com/Alfresco/alfresco-transform-core/blob/master/engines/pdfrenderer/src/main/resources/pdfrenderer_engine_config.json).
|
|
||||||
* [ImageMagick T-Engine configuration](https://github.com/Alfresco/alfresco-transform-core/blob/master/engines/imagemagick/src/main/resources/imagemagick_engine_config.json).
|
|
||||||
* [Libreoffice T-Engine configuration](https://github.com/Alfresco/alfresco-transform-core/blob/master/engines/libreoffice/src/main/resources/libreoffice_engine_config.json).
|
|
||||||
* [Tika T-Engine configuration](https://github.com/Alfresco/alfresco-transform-core/blob/master/engines/tika/src/main/resources/tika_engine_config.json).
|
|
||||||
* [Misc T-Engine configuration](https://github.com/Alfresco/alfresco-transform-core/blob/master/engines/misc/src/main/resources/misc_engine_config.json).
|
|
||||||
|
|
||||||
*Snippet from Tika T-engine configuration:*
|
|
||||||
```json
|
|
||||||
{
|
|
||||||
"transformOptions": {
|
|
||||||
"tikaOptions": [
|
|
||||||
{"value": {"name": "targetEncoding"}}
|
|
||||||
],
|
|
||||||
"pdfboxOptions": [
|
|
||||||
{"value": {"name": "notExtractBookmarksText"}},
|
|
||||||
{"value": {"name": "targetEncoding"}}
|
|
||||||
]
|
|
||||||
},
|
|
||||||
"transformers": [
|
|
||||||
{
|
|
||||||
"transformerName": "PdfBox",
|
|
||||||
"supportedSourceAndTargetList": [
|
|
||||||
{"sourceMediaType": "application/pdf", "targetMediaType": "text/html"},
|
|
||||||
{"sourceMediaType": "application/pdf", "maxSourceSizeBytes": 26214400, "targetMediaType": "text/plain"}
|
|
||||||
],
|
|
||||||
"transformOptions": [
|
|
||||||
"pdfboxOptions"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"transformerName": "TikaAuto",
|
|
||||||
"supportedSourceAndTargetList": [
|
|
||||||
{"sourceMediaType": "application/msword", "priority": 55, "targetMediaType": "text/xml"}
|
|
||||||
],
|
|
||||||
"transformOptions": [
|
|
||||||
"tikaOptions"
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"transformerName": "TextMining",
|
|
||||||
"supportedSourceAndTargetList": [
|
|
||||||
{"sourceMediaType": "application/msword", "targetMediaType": "text/xml"}
|
|
||||||
],
|
|
||||||
"transformOptions": [
|
|
||||||
"tikaOptions"
|
|
||||||
]
|
|
||||||
}
|
|
||||||
]
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
### Transform Options
|
|
||||||
* **transformOptions** provides a list of transform options that may be
|
|
||||||
referenced for use in different transformers. This way common options
|
|
||||||
don't need to be repeated for each transformer, they can be shared between
|
|
||||||
T-Engines. In this example there are two groups of options called **tikaOptions**
|
|
||||||
and **pdfboxOptions** which has a group of options **targetEncoding** and
|
|
||||||
**notExtractBookmarksText**. Unless an option has a **"required": true** field it is
|
|
||||||
considered to be optional.
|
|
||||||
|
|
||||||
*Snippet from ImageMagick T-engine configuration:*
|
|
||||||
```json
|
|
||||||
"transformOptions": {
|
|
||||||
"imageMagickOptions": [
|
|
||||||
{"value": {"name": "alphaRemove"}},
|
|
||||||
{"group": {"transformOptions": [
|
|
||||||
{"value": {"name": "cropGravity"}},
|
|
||||||
{"value": {"name": "cropWidth"}},
|
|
||||||
{"value": {"name": "cropHeight"}},
|
|
||||||
{"value": {"name": "cropPercentage"}},
|
|
||||||
{"value": {"name": "cropXOffset"}},
|
|
||||||
{"value": {"name": "cropYOffset"}}
|
|
||||||
]}},
|
|
||||||
]
|
|
||||||
},
|
|
||||||
```
|
|
||||||
* There are two types of transformOptions, *transformOptionsValue* and *transformOptionsGroup*:
|
|
||||||
* _TransformOptionsValue_ is used to represent a single transformation option, it is defined
|
|
||||||
by a **name** and an optional **required** field.
|
|
||||||
* _TransformOptionGroup_ represents a group of one or more options, it is used to group
|
|
||||||
options that define a
|
|
||||||
characteristic. In the above snippet all the options for crop are defined under a group, it is recommended to
|
|
||||||
use this approach as it is easier to read. A transformOptionsGroup can contain one or more transformOptionsValue
|
|
||||||
and transformOptionsGroup.
|
|
||||||
|
|
||||||
### Transformers
|
|
||||||
* **transformers** - A list of transformer definitions.
|
|
||||||
Each transformer definition should have a unique **transformerName**,
|
|
||||||
specify a **supportedSourceAndTargetList** and indicate which
|
|
||||||
options it supports. As it is shown in the Tika snippet, an *engine_config*
|
|
||||||
can describe one or more transformers, as a T-engine can have
|
|
||||||
multiple transformers (e.g. Tika, Misc). A transformer configuration may
|
|
||||||
specify references to 0 or more transformOptions.
|
|
||||||
|
|
||||||
### Supported Source and Target List
|
|
||||||
* **supportedSourceAndTargetList** is simply a list of source and target
|
|
||||||
Media Types that may be transformed, optionally specifying a
|
|
||||||
**maxSourceSizeBytes** and a **priority** value.
|
|
||||||
* *maxSourceSizeBytes* is used to set the upper size limit of a transformation.
|
|
||||||
* If not specified, the default value for maxSourceSizeBytes is **unlimited**.
|
|
||||||
* *priority* it is used by clients to determine which transfomer to call or by T-engines
|
|
||||||
with multiple transformers to determine which one to use. In the above Tika snippet,
|
|
||||||
both *TikaAuto* and *TextMining* have the capability to transform *"application/msword"*
|
|
||||||
into *"text/xml"*, the transformer containing the source-target media type with higher priority will be chosen by the
|
|
||||||
T-engine as the one to execute the transformation, in this case it will be *TextMining*, because:
|
|
||||||
* If not specified, the default value for priority is **50**.
|
|
||||||
* Note: priority values are like the order in a queue, the **lower** the number the **higher the
|
|
||||||
priority** is.
|
|
||||||
|
|
||||||
## Transformer selection strategy
|
|
||||||
The T-Engine configuration is used to choose which T-Engine will perform a transform.
|
|
||||||
A transformer definition contains a supported list of source and target Media Types. This is used for the
|
|
||||||
most basic selection. This is further refined by checking that the definition also supports transform options
|
|
||||||
(parameters) that have been supplied in a transform request or a Rendition Definition used in a rendition request.
|
|
||||||
Order for selection is:
|
|
||||||
1. Source->Target Media Types
|
|
||||||
2. transformOptions
|
|
||||||
3. maxSourceSizeBytes
|
|
||||||
4. priority
|
|
||||||
|
|
||||||
#### Case 1:
|
|
||||||
```
|
|
||||||
Transformer 1 defines options: Op1, Op2
|
|
||||||
Transformer 2 defines options: Op1, Op2, Op3, Op4
|
|
||||||
```
|
|
||||||
```
|
|
||||||
Rendition provides values for options: Op2, Op3
|
|
||||||
```
|
|
||||||
If we assume both transformers support the required source and target Media Types, Transformer 2 will be selected
|
|
||||||
because it knows about all the supplied options. The definition may also specify that some options are required or grouped.
|
|
||||||
|
|
||||||
#### Case 2:
|
|
||||||
```
|
|
||||||
Transformer 1 defines options: Op1, Op2, maxSize
|
|
||||||
Transformer 2 defines options: Op1, Op2, Op3
|
|
||||||
```
|
|
||||||
```
|
|
||||||
Rendition provides values for options: Op1, Op2
|
|
||||||
```
|
|
||||||
If we assume both transformers support the required source and target Media Types, and file size is greater than *maxSize*
|
|
||||||
,Transformer 2 will be selected because if can handle *maxSourceSizeBytes* for this transformation.
|
|
||||||
|
|
||||||
#### Case 3:
|
|
||||||
```
|
|
||||||
Transformer 1 defines options: Op1, Op2, priorty1
|
|
||||||
Transformer 2 defines options: Op1, Op2, Op3, priority2
|
|
||||||
```
|
|
||||||
```
|
|
||||||
Rendition provides values for options: Op1, Op2
|
|
||||||
```
|
|
||||||
If we assume both transformers support the required source and target Media Types and
|
|
||||||
*priority1* < *priority2*, Transformer 1 will be selected because its priority is higher.
|
|
||||||
|
|
60
docs/t-engines.md
Normal file
60
docs/t-engines.md
Normal file
@@ -0,0 +1,60 @@
|
|||||||
|
## T-Engines
|
||||||
|
|
||||||
|
The t-engines provide the basic transform operations. The Transform Service
|
||||||
|
provides a common base for the communication with other components. It is
|
||||||
|
this base that is described in this section. The base is a Spring Boot
|
||||||
|
application to which transform specific code is added and then wrapped
|
||||||
|
in a Docker image with any programs that the transforms need. The base
|
||||||
|
does not need to be used as long as there appears to be a process responding
|
||||||
|
endpoints and messages.
|
||||||
|
|
||||||
|
A t-engine groups together one of more Transformers. Each Transformer
|
||||||
|
(provided by transform specific code) knows how to perform a set of
|
||||||
|
transformations from one MIME Type to another with a common set of
|
||||||
|
t-options.
|
||||||
|
|
||||||
|
~~~
|
||||||
|
0010 my-t-engine
|
||||||
|
Transformer 1
|
||||||
|
mimetype A -> mimetype B
|
||||||
|
mimetype A -> mimetype C
|
||||||
|
mimetype B -> mimetype C
|
||||||
|
option1
|
||||||
|
option2
|
||||||
|
Transformer 2
|
||||||
|
mimetype A -> mimetype B
|
||||||
|
mimetype D -> mimetype C
|
||||||
|
option2
|
||||||
|
option3
|
||||||
|
0020 another-t-engine
|
||||||
|
...
|
||||||
|
0030 yet-another-t-engine
|
||||||
|
...
|
||||||
|
~~~
|
||||||
|
|
||||||
|
### Endpoints
|
||||||
|
|
||||||
|
* `POST /transform` to perform a transform. There are two forms:
|
||||||
|
* For asynchronous transforms: Perform a transform using a
|
||||||
|
`TransformRequest` received from the t-router via a message queue. The
|
||||||
|
`TransformReply` is sent back via the queue.
|
||||||
|
* For synchronous transforms: Performs a transform on content uploaded as
|
||||||
|
a Multipart File and provides the resulting content as a download.
|
||||||
|
Transform options are extracted from the request properties. The
|
||||||
|
following are not added as transform options, but are used to select the
|
||||||
|
transformer: `sourceMimetype` & `targetMimetype`.
|
||||||
|
* `GET /transform/config` to obtain t-config about what the t-engine supports.
|
||||||
|
It has a parameter `configVersion` to allow a caller and the t-engine to
|
||||||
|
negotiate down to a common format. The value is an integer which indicate
|
||||||
|
which elements may to be added to the config. These elements reflect
|
||||||
|
functionality supported by the base (such as pre-signed URLs). The
|
||||||
|
`CoreVersionDecorator` adds to the Config returned by the transform
|
||||||
|
specific code.
|
||||||
|
* `GET /` provides an html test page to upload a source file, enter transform
|
||||||
|
options and issue a synchronous transform request. Useful in testing.
|
||||||
|
* `GET /log` provides a page with basic log information. Useful in testing.
|
||||||
|
* `GET /error` provides an error page when testing.
|
||||||
|
* `GET /version` provides a String message to be included in client debug
|
||||||
|
messages.
|
||||||
|
* `GET /ready` used by Kubernetes as a ready probe.
|
||||||
|
* `GET /live` used by Kubernetes as a ready probe.
|
308
docs/transform-config.md
Normal file
308
docs/transform-config.md
Normal file
@@ -0,0 +1,308 @@
|
|||||||
|
## T-Engine configuration
|
||||||
|
|
||||||
|
Each t-engine provides an endpoint that returns t-config that defines what
|
||||||
|
it supports. The t-router and t-engines may also have external t-config files.
|
||||||
|
These are combined in name order. As sorting is alphanumeric, you may wish to
|
||||||
|
consider using a fixed length numeric prefix in filenames and t-engine names. As will be seen
|
||||||
|
t-config may reference elements from other components or modify elements
|
||||||
|
from earlier t-config.
|
||||||
|
|
||||||
|
Current configuration files are:
|
||||||
|
* [Pdf-Renderer T-Engine configuration](https://github.com/Alfresco/alfresco-transform-core/blob/master/engines/pdfrenderer/src/main/resources/pdfrenderer_engine_config.json).
|
||||||
|
* [ImageMagick T-Engine configuration](https://github.com/Alfresco/alfresco-transform-core/blob/master/engines/imagemagick/src/main/resources/imagemagick_engine_config.json).
|
||||||
|
* [Libreoffice T-Engine configuration](https://github.com/Alfresco/alfresco-transform-core/blob/master/engines/libreoffice/src/main/resources/libreoffice_engine_config.json).
|
||||||
|
* [Tika T-Engine configuration](https://github.com/Alfresco/alfresco-transform-core/blob/master/engines/tika/src/main/resources/tika_engine_config.json).
|
||||||
|
* [Misc T-Engine configuration](https://github.com/Alfresco/alfresco-transform-core/blob/master/engines/misc/src/main/resources/misc_engine_config.json).
|
||||||
|
|
||||||
|
Additional config files (which may be resources on the classpath or external
|
||||||
|
files) are specified in Spring Boot properties or such as
|
||||||
|
`transform.config.file.<filename>` or environment variables like
|
||||||
|
`TRANSFORM_CONFIG_FILE_<filename>`.
|
||||||
|
|
||||||
|
The following is a simple t-config file from an example Hello World
|
||||||
|
t-engine.
|
||||||
|
|
||||||
|
~~~
|
||||||
|
{
|
||||||
|
"transformOptions":
|
||||||
|
{
|
||||||
|
"helloWorldOptions":
|
||||||
|
[
|
||||||
|
{"value": {"name": "language"}}
|
||||||
|
]
|
||||||
|
},
|
||||||
|
"transformers":
|
||||||
|
[
|
||||||
|
{
|
||||||
|
"transformerName": "helloWorld",
|
||||||
|
"supportedSourceAndTargetList":
|
||||||
|
[
|
||||||
|
{"sourceMediaType": "text/plain", "maxSourceSizeBytes": 50, "targetMediaType": "text/html" }
|
||||||
|
],
|
||||||
|
"transformOptions":
|
||||||
|
[
|
||||||
|
"helloWorldOptions"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
~~~
|
||||||
|
|
||||||
|
* **transformOptions** provides a list of transform options (each with its own
|
||||||
|
name) that may be referenced for use in different transformers. This way
|
||||||
|
common options don't need to be repeated for each transformer. They can
|
||||||
|
even be shared between T-Engines. In this example there is only one group
|
||||||
|
of options called `helloWorldOptions`, which has just one option the
|
||||||
|
`language`. Unless an option has a `"required": true` field it is considered
|
||||||
|
to be optional. You don't need to specify _sourceMimetype, sourceExtension,
|
||||||
|
sourceEncoding, targetMimetype, targetExtension_ or _timeout_ as options as
|
||||||
|
these are available to all transformers.
|
||||||
|
* **transformers** is a list of transformer definitions. Each transformer
|
||||||
|
definition should have a unique `transformerName`, specify a
|
||||||
|
`supportedSourceAndTargetList` and indicate which options it supports.
|
||||||
|
In this case there is only one transformer called `Hello World` and it
|
||||||
|
accepts `helloWorldOptions`. A transformer may specify references to 0
|
||||||
|
or more transformOptions.
|
||||||
|
* **supportedSourceAndTargetList** is simply a list of source and target
|
||||||
|
Media Types that may be transformed, optionally specifying
|
||||||
|
`maxSourceSizeBytes` and `priority` values. In this case there is only one
|
||||||
|
from text to HTML and we have limited the source file size, to avoid
|
||||||
|
transforming files that clearly don't contain names.
|
||||||
|
|
||||||
|
### Transform pipelines
|
||||||
|
|
||||||
|
Transforms may be combined in a pipeline to form a new transformer, where
|
||||||
|
the output from one becomes the input to the next and so on. The t-config
|
||||||
|
defines the sequence of transform steps and intermediate Media Types. Like
|
||||||
|
any other transformer, it specifies a list of supported source and target
|
||||||
|
Media Types. If you don't supply any, all possible combinations are assumed
|
||||||
|
to be available. The definition may reuse the `transformOptions` of
|
||||||
|
transformers in the pipeline, but typically will define its own subset
|
||||||
|
of these.
|
||||||
|
|
||||||
|
The following example begins with the `helloWorld` Transformer, which takes a
|
||||||
|
text file containing a name and produces an HTML file with `Hello <name>`
|
||||||
|
message in the body. This is then transformed back into a text file. This
|
||||||
|
example contains just one pipeline transformer, but many may be defined
|
||||||
|
in the same file.
|
||||||
|
|
||||||
|
~~~
|
||||||
|
{
|
||||||
|
"transformers": [
|
||||||
|
{
|
||||||
|
"transformerName": "helloWorldText",
|
||||||
|
"transformerPipeline" : [
|
||||||
|
{"transformerName": "helloWorld", "targetMediaType": "text/html"},
|
||||||
|
{"transformerName": "html"}
|
||||||
|
],
|
||||||
|
"supportedSourceAndTargetList": [
|
||||||
|
{"sourceMediaType": "text/plain", "priority": 45, "targetMediaType": "text/plain" }
|
||||||
|
],
|
||||||
|
"transformOptions": [
|
||||||
|
"helloWorldOptions"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
~~~
|
||||||
|
|
||||||
|
* **transformerName** Try to create a unique name for the transform.
|
||||||
|
* **transformerPipeline** A list of transformers in the pipeline. The
|
||||||
|
`targetMediaType` specifies the intermediate Media Types between
|
||||||
|
transformers. There is no final `targetMediaType` as this comes from the
|
||||||
|
`supportedSourceAndTargetList`. The `transformerName` may reference a
|
||||||
|
transformer that has not been defined yet. A warning is issued if
|
||||||
|
it remains undefined after all t-config has been combined. Generally
|
||||||
|
it is better for a t-engine rather than the t-router to define pipeline
|
||||||
|
transformers as this limits the number of places that have to be changed.
|
||||||
|
Normally it is obvious which t-engine should contain the definition.
|
||||||
|
* **supportedSourceAndTargetList** The supported source and target Media
|
||||||
|
Types, which refer to the Media Types this pipeline transformer can
|
||||||
|
transform from and to, additionally you can set the `priority` and the
|
||||||
|
`maxSourceSizeBytes`. If blank, this indicates that all possible
|
||||||
|
combinations are supported. This is the cartesian product of all source
|
||||||
|
types to the first intermediate type and all target types from the last
|
||||||
|
intermediate type. Any combinations supported by the first transformer
|
||||||
|
are excluded. They will also have the priority from the first transform.
|
||||||
|
* **transformOptions** A list of references to options required by the
|
||||||
|
pipeline transformer.
|
||||||
|
|
||||||
|
### Failover transforms
|
||||||
|
|
||||||
|
A failover transform, simply provides a list of transforms to be attempted
|
||||||
|
one after another until one succeeds. For example, you may have a fast
|
||||||
|
transform that is able to handle a limited set of transforms and another
|
||||||
|
that is slower but handles all cases.
|
||||||
|
|
||||||
|
~~~
|
||||||
|
{
|
||||||
|
"transformers": [
|
||||||
|
{
|
||||||
|
"transformerName": "imgExtractOrImgCreate",
|
||||||
|
"transformerFailover" : [ "imgExtract", "imgCreate" ],
|
||||||
|
"supportedSourceAndTargetList": [
|
||||||
|
{"sourceMediaType": "application/vnd.oasis.opendocument.graphics", "priority": 150, "targetMediaType": "image/png" },
|
||||||
|
...
|
||||||
|
{"sourceMediaType": "application/vnd.sun.xml.calc.template", "priority": 150, "targetMediaType": "image/png" }
|
||||||
|
]
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
~~~
|
||||||
|
|
||||||
|
* **transformerName** Try to create a unique name for the transform.
|
||||||
|
* **transformerFaillover** A list of transformers to try. This may include
|
||||||
|
references to transformer that have not been defined yet. Generally it
|
||||||
|
is better for the t-engine rather than the t-router to define failover
|
||||||
|
transformers as this limits the number of places that have to be changed.
|
||||||
|
Normally it is obvious which t-engine should contain the definition.
|
||||||
|
* **supportedSourceAndTargetList** The supported source and target Media
|
||||||
|
Types, which refer to the Media Types this failover transformer can
|
||||||
|
transform from and to, additionally you can set the `priority` and the
|
||||||
|
`maxSourceSizeBytes`. Unlike pipelines, it must not be blank.
|
||||||
|
* **transformOptions** A list of references to options required by the
|
||||||
|
pipeline transformer.
|
||||||
|
|
||||||
|
### Overriding transforms
|
||||||
|
|
||||||
|
It is possible to override a previously defined transform definition. The
|
||||||
|
following example removes most of the supported source to target media
|
||||||
|
types from the standard `"libreoffice"` transform. It also changes the
|
||||||
|
max size and priority of others. This is not something you would normally
|
||||||
|
want to do.
|
||||||
|
~~~
|
||||||
|
{
|
||||||
|
"transformers": [
|
||||||
|
{
|
||||||
|
"transformerName": "libreoffice",
|
||||||
|
"supportedSourceAndTargetList": [
|
||||||
|
{"sourceMediaType": "text/csv", "maxSourceSizeBytes": 1000, "targetMediaType": "text/html" },
|
||||||
|
{"sourceMediaType": "text/csv", "targetMediaType": "application/vnd.oasis.opendocument.spreadsheet" },
|
||||||
|
{"sourceMediaType": "text/csv", "targetMediaType": "application/vnd.oasis.opendocument.spreadsheet-template" },
|
||||||
|
{"sourceMediaType": "text/csv", "targetMediaType": "text/tab-separated-values" },
|
||||||
|
{"sourceMediaType": "text/csv", "priority": 45, "targetMediaType": "application/vnd.ms-excel" },
|
||||||
|
{"sourceMediaType": "text/csv", "priority": 155, "targetMediaType": "application/pdf" }
|
||||||
|
]
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
~~~
|
||||||
|
|
||||||
|
### Removing a transformer
|
||||||
|
|
||||||
|
To discard a previous transformer definition include its name in the
|
||||||
|
optional `"removeTransformers"` list. You might want to do this if you
|
||||||
|
have a replacement and wish keep the overall configuration simple (so it
|
||||||
|
contains no alternatives), or you wish to temporarily remove it. The
|
||||||
|
following example removes two transformers before processing any other
|
||||||
|
configuration in the same T-Engine or pipeline file.
|
||||||
|
|
||||||
|
~~~
|
||||||
|
{
|
||||||
|
"removeTransformers" : [
|
||||||
|
"libreoffice",
|
||||||
|
"Archive"
|
||||||
|
]
|
||||||
|
...
|
||||||
|
}
|
||||||
|
~~~
|
||||||
|
|
||||||
|
### Overriding the supportedSourceAndTargetList
|
||||||
|
|
||||||
|
Rather than totally override an existing transform definition, it is
|
||||||
|
generally simpler to modify the `"supportedSourceAndTargetList"` by adding
|
||||||
|
elements to the optional `"addSupported"`, `"removeSupported"` and
|
||||||
|
`"overrideSupported"` lists. You will need to specify the
|
||||||
|
`"transformerName"` but you will not need to repeat all the other
|
||||||
|
`"supportedSourceAndTargetList"` values, which means if there are changes
|
||||||
|
in the original, the same change is not needed in a second place. The
|
||||||
|
following example adds one transform, removes two others and changes
|
||||||
|
the `"priority"` and `"maxSourceSizeBytes"` of another. This is done before
|
||||||
|
processing any other configuration in the same T-Engine or pipeline file.
|
||||||
|
~~~
|
||||||
|
{
|
||||||
|
"addSupported": [
|
||||||
|
{
|
||||||
|
"transformerName": "Archive",
|
||||||
|
"sourceMediaType": "application/zip",
|
||||||
|
"targetMediaType": "text/csv",
|
||||||
|
"priority": 60,
|
||||||
|
"maxSourceSizeBytes": 18874368
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"removeSupported": [
|
||||||
|
{
|
||||||
|
"transformerName": "Archive",
|
||||||
|
"sourceMediaType": "application/zip",
|
||||||
|
"targetMediaType": "text/xml"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"transformerName": "Archive",
|
||||||
|
"sourceMediaType": "application/zip",
|
||||||
|
"targetMediaType": "text/plain"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"overrideSupported": [
|
||||||
|
{
|
||||||
|
"transformerName": "Archive",
|
||||||
|
"sourceMediaType": "application/zip",
|
||||||
|
"targetMediaType": "text/html",
|
||||||
|
"priority": 60,
|
||||||
|
"maxSourceSizeBytes": 18874368
|
||||||
|
}
|
||||||
|
]
|
||||||
|
...
|
||||||
|
}
|
||||||
|
~~~
|
||||||
|
|
||||||
|
### Default maxSourceSizeBytes and priority values
|
||||||
|
|
||||||
|
When defining `"supportedSourceAndTargetList"` elements the `"priority"`
|
||||||
|
and `"maxSourceSizeBytes"` are optional and normally have the default
|
||||||
|
values of 50 and -1 (no limit). It is possible to change those defaults.
|
||||||
|
In precedence order from most specific to most general these are defined
|
||||||
|
by combinations of `"transformerName"` and `"sourceMediaType"`.
|
||||||
|
|
||||||
|
* **transformer and source media type default** both specified
|
||||||
|
* **transformer** default only the transformer name is specified
|
||||||
|
* **source media type default** only the source media type is specified
|
||||||
|
* **system wide default** neither are specified.
|
||||||
|
|
||||||
|
Both `"priority"` and `"maxSourceSizeBytes"` may be specified in an element,
|
||||||
|
but if only one is specified it is only that value that is being defaulted.
|
||||||
|
|
||||||
|
Being able to change the defaults is particularly useful once a T-Engine
|
||||||
|
has been developed as it allows a system administrator to handle
|
||||||
|
limitations that are only found later. The `system wide defaults` are
|
||||||
|
generally not used but are included for completeness. The following
|
||||||
|
example says that the `"Office"` transformer by default should only handle
|
||||||
|
zip files up to 18 Mb and by default the maximum size of a `.doc` file to be
|
||||||
|
transformed is 4 Mb. The third example defaults the priority, possibly
|
||||||
|
allowing another transformer that has specified a priority of say `50` to
|
||||||
|
be used in preference.
|
||||||
|
|
||||||
|
Defaults values are only applied after all t-config has been read.
|
||||||
|
|
||||||
|
~~~
|
||||||
|
{
|
||||||
|
"supportedDefaults": [
|
||||||
|
{
|
||||||
|
"transformerName": "Office", // default for a source type within a transformer
|
||||||
|
"sourceMediaType": "application/zip",
|
||||||
|
"maxSourceSizeBytes": 18874368
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"sourceMediaType": "application/msword", // defaults for a source type
|
||||||
|
"maxSourceSizeBytes": 4194304,
|
||||||
|
"priority": 45
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"priority": 60 // system wide default
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"maxSourceSizeBytes": -1 // system wide default
|
||||||
|
}
|
||||||
|
]
|
||||||
|
...
|
||||||
|
}
|
||||||
|
~~~
|
140
docs/transform-specific-code.md
Normal file
140
docs/transform-specific-code.md
Normal file
@@ -0,0 +1,140 @@
|
|||||||
|
## Transform specific code
|
||||||
|
|
||||||
|
To create a new t-engine an author uses a base t-engine (a Spring Boot
|
||||||
|
application) and implements the following interfaces. An implementation of
|
||||||
|
the `CustomTransformer` provides the actual transformation code and the
|
||||||
|
implementation of the `TransformEngine` says what it is capable of
|
||||||
|
transforming. The `TransformConfig` is normally read from a json file on the
|
||||||
|
classpath. Multiple `CustomTransformer` implementations may be in a singe
|
||||||
|
t-engine. As a result the author can concentrate on the code that transforms
|
||||||
|
one format to another without really worrying about all the plumbing.
|
||||||
|
Typically, the transform specific code uses a 3rd party library or an
|
||||||
|
external executable which needs to be added to the Docker image.
|
||||||
|
|
||||||
|
~~~
|
||||||
|
package org.alfresco.transform;
|
||||||
|
|
||||||
|
import org.alfresco.transform.config.TransformConfig;
|
||||||
|
import org.alfresco.transformer.probes.ProbeTestTransform;
|
||||||
|
|
||||||
|
import java.util.Set;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Interface to be implemented by transform specific code. Provides information
|
||||||
|
* about the t-engine as a whole. So that it is automatically picked up, it must
|
||||||
|
* exist in a package under {@code org.alfresco.transform} and have the Spring
|
||||||
|
* {@code @Component} annotation.
|
||||||
|
*/
|
||||||
|
public interface TransformEngine
|
||||||
|
{
|
||||||
|
/**
|
||||||
|
* @return the name of the t-engine. The t-router reads config from t-engines
|
||||||
|
* in name order.
|
||||||
|
*/
|
||||||
|
String getTransformEngineName();
|
||||||
|
|
||||||
|
/**
|
||||||
|
* @return a definition of what the t-engine supports. Normally read from a json
|
||||||
|
* Resource on the classpath.
|
||||||
|
*/
|
||||||
|
TransformConfig getTransformConfig();
|
||||||
|
|
||||||
|
/**
|
||||||
|
* @return a ProbeTestTransform (will do a quick transform) for k8 liveness and
|
||||||
|
* readiness probes.
|
||||||
|
*/
|
||||||
|
ProbeTransform getProbeTransform();
|
||||||
|
}
|
||||||
|
~~~
|
||||||
|
|
||||||
|
implementations of the following interface provide the actual transform code.
|
||||||
|
|
||||||
|
~~~
|
||||||
|
package org.alfresco.transform;
|
||||||
|
|
||||||
|
import java.io.InputStream;
|
||||||
|
import java.io.OutputStream;
|
||||||
|
import java.util.Map;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Interface to be implemented by transform specific code. The
|
||||||
|
* {@code transformerName} should match the transformerName in the
|
||||||
|
* {@link TransformConfig} returned by the {@link TransformEngine}. So that it is
|
||||||
|
* automatically picked up, it must exist in a package under
|
||||||
|
* {@code org.alfresco.transform} and have the Spring {@code @Component} annotation.
|
||||||
|
*
|
||||||
|
* Implementations may also use the {@link TransformManager} if they wish to
|
||||||
|
* interact with the base t-engine.
|
||||||
|
*/
|
||||||
|
public interface CustomTransformer
|
||||||
|
{
|
||||||
|
String getTransformerName();
|
||||||
|
|
||||||
|
void transform(String sourceMimetype, InputStream inputStream,
|
||||||
|
String targetMimetype, OutputStream outputStream,
|
||||||
|
Map<String, String> transformOptions,
|
||||||
|
TransformManager transformManager) throws Exception;
|
||||||
|
}
|
||||||
|
~~~
|
||||||
|
|
||||||
|
The implementation of the following interface is provided by the t-base,
|
||||||
|
allows the `CustomTransformer` to interact with the base t-engine. The
|
||||||
|
creation of Files is discouraged as it is better not to leave files on disk.
|
||||||
|
|
||||||
|
~~~
|
||||||
|
package org.alfresco.transform.base;
|
||||||
|
|
||||||
|
import java.io.File;
|
||||||
|
import java.io.InputStream;
|
||||||
|
import java.io.OutputStream;
|
||||||
|
import java.util.Map;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Allows {@link CustomTransformer} implementations to interact with the base
|
||||||
|
* t-engine.
|
||||||
|
*/
|
||||||
|
public interface TransformManager
|
||||||
|
{
|
||||||
|
/**
|
||||||
|
* Allows a CustomTransformer to use a local source File rather than the
|
||||||
|
* supplied InputStream. To avoid creating extra files, if a File has already
|
||||||
|
* been created by the base t-engine, it is returned.
|
||||||
|
*/
|
||||||
|
File createSourceFile();
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Allows a CustomTransformer to use a local target File rather than the
|
||||||
|
* supplied OutputStream. To avoid creating extra files, if a File has already
|
||||||
|
* been created by the base t-engine, it is returned.
|
||||||
|
*/
|
||||||
|
File createTargetFile();
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Allows a single transform request to have multiple transform responses. For
|
||||||
|
* example, images from a video at different time offsets or different pages of
|
||||||
|
* a document. Following a call to this method a transform response is made with
|
||||||
|
* the data sent to the current {@code OutputStream}. If this method has been
|
||||||
|
* called, there will not be another response when {@link CustomTransformer#
|
||||||
|
* transform(String, InputStream, String, OutputStream, Map, TransformManager)}
|
||||||
|
* returns and any data written to the final {@code OutputStream} will be
|
||||||
|
* ignored.
|
||||||
|
* @param index returned with the response, so that the fragment may be
|
||||||
|
* distinguished from other responses. Renditions use the index
|
||||||
|
* as an offset into elements. A {@code null} value indicates
|
||||||
|
* that there is no more output and any data sent to the current
|
||||||
|
* {@code outputStream} will be ignored.
|
||||||
|
* @param finished indicates this is the final fragment. {@code False} indicates
|
||||||
|
* that it is expected there will be more fragments. There need
|
||||||
|
* not be a call with this parameter set to {@code true}.
|
||||||
|
* @return a new {@code OutputStream} for the next fragment. A {@code null} will
|
||||||
|
* be returned if {@code index} was {@code null} or {@code
|
||||||
|
* finished} was {@code true}.
|
||||||
|
* @throws TransformException if a synchronous (http) request has been made as
|
||||||
|
* this only works with requests on queues, or the first call to
|
||||||
|
* this method indicated there was no output, or another call is
|
||||||
|
* made after it has been indicated that there should be no more
|
||||||
|
* fragments.
|
||||||
|
* @throws IOException if there was a problem sending the response.
|
||||||
|
OutputStream respondWithFragment(Integer index);
|
||||||
|
}
|
||||||
|
~~~
|
27
docs/transformer-selection.md
Normal file
27
docs/transformer-selection.md
Normal file
@@ -0,0 +1,27 @@
|
|||||||
|
## Transformer selection strategy
|
||||||
|
|
||||||
|
The TransformRegistry uses t-config to choose which Transformer will be
|
||||||
|
used. A transformer definition contains a supported list of source and
|
||||||
|
target Media Types. This is used for the most basic selection. It is further
|
||||||
|
refined by checking that the definition also supports transform options (the
|
||||||
|
parameters) that have been supplied in a transform request.
|
||||||
|
|
||||||
|
~~~
|
||||||
|
Transformer 1 defines options: Op1, Op2
|
||||||
|
Transformer 2 defines options: Op1, Op2, Op3, Op4
|
||||||
|
|
||||||
|
Transform request provides values for options: Op2, Op3
|
||||||
|
~~~
|
||||||
|
If we assume both transformers support the required source and target Media
|
||||||
|
Types, Transformer 2 will be selected because it knows about all the supplied
|
||||||
|
options. The definition may also specify that some options are required or
|
||||||
|
grouped. If any members of an optional group are supplied, all required
|
||||||
|
members of that group become required.
|
||||||
|
|
||||||
|
The configuration may impose a source file size limit resulting in the
|
||||||
|
selection of a different transformer. Size limits are normally added to avoid
|
||||||
|
the transforms consuming too many resources.
|
||||||
|
|
||||||
|
The configuration may also specify a priority which will be used in
|
||||||
|
Transformer selection if there are a number of possible transformers. The
|
||||||
|
highest priority is the one with the lowest number.
|
46
docs/transformerDebug.md
Normal file
46
docs/transformerDebug.md
Normal file
@@ -0,0 +1,46 @@
|
|||||||
|
## TransformerDebug
|
||||||
|
|
||||||
|
In addition to any normal logging, the t-engines, t-router and t-client also
|
||||||
|
use the `TransformerDebug` class to provide request based logging. The
|
||||||
|
following is an example from Alfresco after the upload of a `docx` file.
|
||||||
|
|
||||||
|
~~~
|
||||||
|
163 docx json AGM 2016 - Masters report.docx 14.8 KB -- metadataExtract -- TransformService
|
||||||
|
163 workspace://SpacesStore/0db3a665-328d-4437-85ed-56b753cf19c8 1563306426
|
||||||
|
163 docx json 14.8 KB -- metadataExtract -- PoiMetadataExtractor
|
||||||
|
163 cm:title=
|
||||||
|
163 cm:author=James Dobinson
|
||||||
|
163 Finished in 664 ms
|
||||||
|
...
|
||||||
|
164 docx png AGM 2016 - Masters report.docx 14.8 KB -- doclib -- TransformService
|
||||||
|
164 workspace://SpacesStore/0db3a665-328d-4437-85ed-56b753cf19c8 1563306426
|
||||||
|
164 docx png 14.8 KB -- doclib -- officeToImageViaPdf
|
||||||
|
164.1 docx pdf libreoffice
|
||||||
|
164.2 pdf png pdfToImageViaPng
|
||||||
|
164.2.1 pdf png pdfrenderer
|
||||||
|
164.2.2 png png imagemagick
|
||||||
|
164.2.2 endPage="0"
|
||||||
|
164.2.2 resizeHeight="100"
|
||||||
|
164.2.2 thumbnail="true"
|
||||||
|
164.2.2 startPage="0"
|
||||||
|
164.2.2 resizeWidth="100"
|
||||||
|
164.2.2 autoOrient="true"
|
||||||
|
164.2.2 allowEnlargement="false"
|
||||||
|
164.2.2 maintainAspectRatio="true"
|
||||||
|
164 Finished in 725 ms
|
||||||
|
~~~
|
||||||
|
|
||||||
|
This log happens to be from the t-client, but similar log lines exist in the
|
||||||
|
t-router and individual t-engines.
|
||||||
|
|
||||||
|
All lines start with a reference, which starts with the client’s request
|
||||||
|
number (`163`, `164` if known) and then a nested pipeline or failover
|
||||||
|
structure. The first request extracts metadata and the second creates a
|
||||||
|
thumbnail rendition (called `doclib`). The second request is handled by a
|
||||||
|
pipeline called `officeToImageViaPdf` which uses `libreoffice` to transform
|
||||||
|
to `pdf` and then another pipeline to convert to `png`. The last step
|
||||||
|
(`164.2.2`) in the process resizes the `png` using a number of transform
|
||||||
|
options.
|
||||||
|
|
||||||
|
If requested, log information is passed back in the TransformReply's
|
||||||
|
clientData.
|
@@ -38,8 +38,8 @@ import java.util.Map;
|
|||||||
@ConfigurationProperties(prefix = "transform.config")
|
@ConfigurationProperties(prefix = "transform.config")
|
||||||
public class TransformConfigFiles
|
public class TransformConfigFiles
|
||||||
{
|
{
|
||||||
// Populated from Spring Boot properties or such as transform.config.file.<engineName> or environment variables like
|
// Populated from Spring Boot properties or such as transform.config.file.<filename> or environment variables like
|
||||||
// TRANSFORM_CONFIG_FILE_<engineName>.
|
// TRANSFORM_CONFIG_FILE_<filename>.
|
||||||
private final Map<String, String> files = new HashMap<>();
|
private final Map<String, String> files = new HashMap<>();
|
||||||
|
|
||||||
public Map<String, String> getFile()
|
public Map<String, String> getFile()
|
||||||
|
Reference in New Issue
Block a user