alfresco-transform-core/docs/engine_config.md
2019-09-16 13:08:47 +03:00

168 lines
7.7 KiB
Markdown

## T-Engine configuration
T-Engines provide a */transform/config* end point for clients (e.g. Transform-Router or
Alfresco-Repository) that indicate what is supported. T-Engines store this
configuration as a JSON resource file named *engine_config.json*.
The config can be found under `alfresco-transform-core\<t-engine-name>\src\main\resources
\engine_config.json`; current configuration files are:
* [Pdf-Renderer T-Engine configuration](https://github.com/Alfresco/alfresco-transform-core/blob/master/alfresco-docker-alfresco-pdf-renderer/src/main/resources/engine_config.json).
* [ImageMagick T-Engine configuration](https://github.com/Alfresco/alfresco-transform-core/blob/master/alfresco-docker-imagemagick/src/main/resources/engine_config.json).
* [Libreoffice T-Engine configuration](https://github.com/Alfresco/alfresco-transform-core/blob/master/alfresco-docker-libreoffice/src/main/resources/engine_config.json).
* [Tika T-Engine configuration](https://github.com/Alfresco/alfresco-transform-core/blob/master/alfresco-docker-tika/src/main/resources/engine_config.json).
* [Misc T-Engine configuration](https://github.com/Alfresco/alfresco-transform-core/blob/master/alfresco-docker-transform-misc/src/main/resources/engine_config.json).
*Snippet from Tika T-engine configuration:*
```json
{
"transformOptions": {
"tikaOptions": [
{"value": {"name": "targetEncoding"}}
],
"pdfboxOptions": [
{"value": {"name": "notExtractBookmarksText"}},
{"value": {"name": "targetEncoding"}}
]
},
"transformers": [
{
"transformerName": "PdfBox",
"supportedSourceAndTargetList": [
{"sourceMediaType": "application/pdf", "targetMediaType": "text/html"},
{"sourceMediaType": "application/pdf", "maxSourceSizeBytes": 26214400, "targetMediaType": "text/plain"}
],
"transformOptions": [
"pdfboxOptions"
]
},
{
"transformerName": "TikaAuto",
"supportedSourceAndTargetList": [
{"sourceMediaType": "application/msword", "priority": 55, "targetMediaType": "text/xml"}
],
"transformOptions": [
"tikaOptions"
]
},
{
"transformerName": "TextMining",
"supportedSourceAndTargetList": [
{"sourceMediaType": "application/msword", "targetMediaType": "text/xml"}
],
"transformOptions": [
"tikaOptions"
]
}
]
}
```
### Transform Options
* **transformOptions** provides a list of transform options that may be
referenced for use in different transformers. This way common options
don't need to be repeated for each transformer, they can be shared between
T-Engines. In this example there are two groups of options called **tikaOptions**
and **pdfboxOptions** which has a group of options **targetEncoding** and
**notExtractBookmarksText**. Unless an option has a **"required": true** field it is
considered to be optional. You don't need to specify *sourceMimetype*,
*targetMimetype*, *sourceExtension* or *targetExtension* as options as
these are automatically added.
*Snippet from ImageMagick T-engine configuration:*
```json
"transformOptions": {
"imageMagickOptions": [
{"value": {"name": "alphaRemove"}},
{"group": {"transformOptions": [
{"value": {"name": "cropGravity"}},
{"value": {"name": "cropWidth"}},
{"value": {"name": "cropHeight"}},
{"value": {"name": "cropPercentage"}},
{"value": {"name": "cropXOffset"}},
{"value": {"name": "cropYOffset"}}
]}},
]
},
```
* There are two types of transformOptions, *transformOptionsValue* and *transformOptionsGroup*:
* _TransformOptionsValue_ is used to represent a single transformation option, it is defined
by a **name** and an optional **required** field.
* _TransformOptionGroup_ represents a group of one or more options, it is used to group
options that define a
characteristic. In the above snippet all the options for crop are defined under a group, it is recommended to
use this approach as it is easier to read. A transformOptionsGroup can contain one or more transformOptionsValue
and transformOptionsGroup.
**Limitations**:
* For a transformOptions to be referenced in a different T-engine, another transformer
with the complete definition of the transformOptions needs to return the config to the client.
* In a transformOptions definition it is not allowed to use a reference to another tranformOption.
### Transformers
* **transformers** - A list of transformer definitions.
Each transformer definition should have a unique **transformerName**,
specify a **supportedSourceAndTargetList** and indicate which
options it supports. As it is shown in the Tika snippet, an *engine_config*
can describe one or more transformers, as a T-engine can have
multiple transformers (e.g. Tika, Misc). A transformer configuration may
specify references to 0 or more transformOptions.
### Supported Source and Target List
* **supportedSourceAndTargetList** is simply a list of source and target
Media Types that may be transformed, optionally specifying a
**maxSourceSizeBytes** and a **priority** value.
* *maxSourceSizeBytes* is used to set the upper size limit of a transformation.
* If not specified, the default value for maxSourceSizeBytes is **unlimited**.
* *priority* it is used by clients to determine which transfomer to call or by T-engines
with multiple transformers to determine which one to use. In the above Tika snippet,
both *TikaAuto* and *TextMining* have the capability to transform *"application/msword"*
into *"text/xml"*, the transformer containing the source-target media type with higher priority will be chosen by the
T-engine as the one to execute the transformation, in this case it will be *TextMining*, because:
* If not specified, the default value for priority is **50**.
* Note: priority values are like the order in a queue, the **lower** the number the **higher the
priority** is.
## Transformer selection strategy
The ACS repository will use the T-Engine configuration to choose which T-Engine will perform a transform.
A transformer definition contains a supported list of source and target Media Types. This is used for the
most basic selection. This is further refined by checking that the definition also supports transform options
(parameters) that have been supplied in a transform request or a Rendition Definition used in a rendition request.
Order for selection is:
1. Source->Target Media Types
2. transformOptions
3. maxSourceSizeBytes
4. priority
#### Case 1:
```
Transformer 1 defines options: Op1, Op2
Transformer 2 defines options: Op1, Op2, Op3, Op4
```
```
Rendition provides values for options: Op2, Op3
```
If we assume both transformers support the required source and target Media Types, Transformer 2 will be selected
because it knows about all the supplied options. The definition may also specify that some options are required or grouped.
#### Case 2:
```
Transformer 1 defines options: Op1, Op2, maxSize
Transformer 2 defines options: Op1, Op2, Op3
```
```
Rendition provides values for options: Op1, Op2
```
If we assume both transformers support the required source and target Media Types, and file size is greater than *maxSize*
,Transformer 2 will be selected because if can handle *maxSourceSizeBytes* for this transformation.
#### Case 3:
```
Transformer 1 defines options: Op1, Op2, priorty1
Transformer 2 defines options: Op1, Op2, Op3, priority2
```
```
Rendition provides values for options: Op1, Op2
```
If we assume both transformers support the required source and target Media Types and
*priority1* < *priority2*, Transformer 1 will be selected because its priority is higher.