mirror of
https://github.com/Alfresco/alfresco-transform-core.git
synced 2025-05-26 17:24:47 +00:00
REPO-4639 : Create engine_config.md
This commit is contained in:
parent
da69bba475
commit
1e85acd246
163
docs/engine_config.md
Normal file
163
docs/engine_config.md
Normal file
@ -0,0 +1,163 @@
|
||||
## T-Engine configuration
|
||||
|
||||
T-Engines provides a */transform/config* end point for clients (e.g. Transform-Router or Alfresco-Repository) to
|
||||
determine what it supported. T-engine stores this configuration as a JSON file named *engine_config.json*.
|
||||
|
||||
This can be found under *alfresco-transform-core\t-engine-name\src\main\resources\engine_config.json*, current configuration files are:
|
||||
* [Pdf-Renderer T-Engine configuration](https://github.com/Alfresco/alfresco-transform-core/blob/master/alfresco-docker-alfresco-pdf-renderer/src/main/resources/engine_config.json).
|
||||
* [ImageMagick T-Engine configuration](https://github.com/Alfresco/alfresco-transform-core/blob/master/alfresco-docker-imagemagick/src/main/resources/engine_config.json).
|
||||
* [Libreoffice T-Engine configuration](https://github.com/Alfresco/alfresco-transform-core/blob/master/alfresco-docker-libreoffice/src/main/resources/engine_config.json).
|
||||
* [Tika T-Engine configuration](https://github.com/Alfresco/alfresco-transform-core/blob/master/alfresco-docker-tika/src/main/resources/engine_config.json).
|
||||
* [Misc T-Engine configuration](https://github.com/Alfresco/alfresco-transform-core/blob/master/alfresco-docker-transform-misc/src/main/resources/engine_config.json).
|
||||
|
||||
*Snippet from Tika T-engine configuration:*
|
||||
```json
|
||||
{
|
||||
"transformOptions": {
|
||||
"tikaOptions": [
|
||||
{"value": {"name": "targetEncoding"}}
|
||||
],
|
||||
"pdfboxOptions": [
|
||||
{"value": {"name": "notExtractBookmarksText"}},
|
||||
{"value": {"name": "targetEncoding"}}
|
||||
]
|
||||
},
|
||||
"transformers": [
|
||||
{
|
||||
"transformerName": "PdfBox",
|
||||
"supportedSourceAndTargetList": [
|
||||
{"sourceMediaType": "application/pdf", "targetMediaType": "text/html"},
|
||||
{"sourceMediaType": "application/pdf", "maxSourceSizeBytes": 26214400, "targetMediaType": "text/plain"}
|
||||
],
|
||||
"transformOptions": [
|
||||
"pdfboxOptions"
|
||||
]
|
||||
},
|
||||
{
|
||||
"transformerName": "TikaAuto",
|
||||
"supportedSourceAndTargetList": [
|
||||
{"sourceMediaType": "application/msword", "priority": 55, "targetMediaType": "text/xml"}
|
||||
],
|
||||
"transformOptions": [
|
||||
"tikaOptions"
|
||||
]
|
||||
},
|
||||
{
|
||||
"transformerName": "TextMining",
|
||||
"supportedSourceAndTargetList": [
|
||||
{"sourceMediaType": "application/msword", "targetMediaType": "text/xml"}
|
||||
],
|
||||
"transformOptions": [
|
||||
"tikaOptions"
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
### Transform Options
|
||||
* **transformOptions** provides a list of transform options that may be
|
||||
referenced for use in different transformers. This way common options
|
||||
don't need to be repeated for each transformer, they can be shared between
|
||||
T-Engines. In this example there are two groups of options called **tikaOptions**
|
||||
and **pdfboxOptions** which has a group of options **targetEncoding** and
|
||||
**notExtractBookmarksText**. Unless an option has a **"required": true** field it is
|
||||
considered to be optional. You don't need to specify *sourceMimetype*,
|
||||
*targetMimetype*, *sourceExtension* or *targetExtension* as options as
|
||||
these are automatically added.
|
||||
|
||||
*Snippet from ImageMagick T-engine configuration:*
|
||||
```json
|
||||
"transformOptions": {
|
||||
"imageMagickOptions": [
|
||||
{"value": {"name": "alphaRemove"}},
|
||||
{"group": {"transformOptions": [
|
||||
{"value": {"name": "cropGravity"}},
|
||||
{"value": {"name": "cropWidth"}},
|
||||
{"value": {"name": "cropHeight"}},
|
||||
{"value": {"name": "cropPercentage"}},
|
||||
{"value": {"name": "cropXOffset"}},
|
||||
{"value": {"name": "cropYOffset"}}
|
||||
]}},
|
||||
]
|
||||
},
|
||||
```
|
||||
* There are two types of transformOptions, *transformOptionsValue* and *transformOptionsGroup*.
|
||||
* The transformOptionsValue is used to represent a single transformation option, it is defined by a **name**
|
||||
and an optional **required** field.
|
||||
* TransformOptionGroup represents a group of one or more options, it is used to group options that define a
|
||||
characteristic. In the above snippet all the options for crop are defined under a group, it is recommended to
|
||||
use this approach as it is easier to read. A transformOptionsGroup can contain one or more transformOptionsValue
|
||||
and transformOptionsGroup.
|
||||
|
||||
**Limitations**:
|
||||
* For a transformOptions to be referenced in a different T-engine, another transformer
|
||||
with the complete definition of the transformOptions needs to return the config to the client.
|
||||
* In a transformOptions definition it is not allowed to use a reference to another tranformOption.
|
||||
|
||||
### Transformers
|
||||
* **transformers** - A list of transformer definitions.
|
||||
Each transformer definition should have a unique **transformerName**,
|
||||
specify a **supportedSourceAndTargetList** and indicate which
|
||||
options it supports. As is shown in the Tika snippet, in an *engine_config*
|
||||
there can be one or multiple transformers defined, this is because a T-engine can have
|
||||
multiple transformers (e.g. Tika, Misc). A transformer configuration may
|
||||
specify references to 0 or more transformOptions.
|
||||
|
||||
### Supported Source and Target List
|
||||
* **supportedSourceAndTargetList** is simply a list of source and target
|
||||
Media Types that may be transformed, optionally specifying a
|
||||
**maxSourceSizeBytes** and a **priority** value.
|
||||
* *maxSourceSizeBytes* is used to set the upper size limit of a transformation.
|
||||
* If not specified, the default value for maxSourceSizeBytes is **unlimited**.
|
||||
* *priority* it is used by clients to determine which transfomer to call or by T-engines
|
||||
with multiple transformers to determine which one to use. In the above Tika snippet,
|
||||
both *TikaAuto* and *TextMining* have the capability to transform *"application/msword"*
|
||||
into *"text/xml"*, the transformer containing the source-target media type with higher priority will be chosen by the
|
||||
T-engine as the one to execute the transformation, in this case it will be *TextMining*, because:
|
||||
* If not specified, the default value for priority is **50**.
|
||||
* Note: priority values are like a order in a queue, the **lower** the number the **higher the priority** is.
|
||||
|
||||
## Transformer selection strategy
|
||||
The ACS repository will use the T-Engine configuration to choose which T-Engine will perform a transform.
|
||||
A transformer definition contains a supported list of source and target Media Types. This is used for the
|
||||
most basic selection. This is further refined by checking that the definition also supports transform options
|
||||
(parameters) that have been supplied in a transform request or a Rendition Definition used in a rendition request.
|
||||
Order for selection is:
|
||||
1. Source->Target Media Types
|
||||
2. transformOptions
|
||||
3. maxSourceSizeBytes
|
||||
4. priority
|
||||
|
||||
#### Case 1:
|
||||
```
|
||||
Transformer 1 defines options: Op1, Op2
|
||||
Transformer 2 defines options: Op1, Op2, Op3, Op4
|
||||
```
|
||||
```
|
||||
Rendition provides values for options: Op2, Op3
|
||||
```
|
||||
If we assume both transformers support the required source and target Media Types, Transformer 2 will be selected
|
||||
because it knows about all the supplied options. The definition may also specify that some options are required or grouped.
|
||||
|
||||
#### Case 2:
|
||||
```
|
||||
Transformer 1 defines options: Op1, Op2, maxSize
|
||||
Transformer 2 defines options: Op1, Op2, Op3
|
||||
```
|
||||
```
|
||||
Rendition provides values for options: Op1, Op2
|
||||
```
|
||||
If we assume both transformers support the required source and target Media Types, and file size is greater than *maxSize*
|
||||
,Transformer 2 will be selected because if can handle *maxSourceSizeBytes* for this transformation.
|
||||
|
||||
#### Case 3:
|
||||
```
|
||||
Transformer 1 defines options: Op1, Op2, priorty1
|
||||
Transformer 2 defines options: Op1, Op2, Op3, priority2
|
||||
```
|
||||
```
|
||||
Rendition provides values for options: Op1, Op2
|
||||
```
|
||||
If we assume both transformers support the required source and target Media Types, and *priority1* < *priority2*
|
||||
,Transformer 1 will be selected because it the priority is higher.
|
||||
|
Loading…
x
Reference in New Issue
Block a user