mirror of
https://github.com/Alfresco/alfresco-transform-core.git
synced 2025-05-12 17:04:48 +00:00
Refactor to clean up packages in the t-model and to introduce a simpler to implement t-engine base. The new t-engines (tika, imagemagick, libreoffice, pdfrenderer, misc, aio, aspose) and t-router may be used in combination with older components as the API between the content Repo and between components has not changed. As far as possible the same artifacts are created (the -boot projects no longer exist). They may be used with older ACS repo versions. The main changes to look for are: * The introduction of TransformEngine and CustomTransformer interfaces to be implemented. * The removal in t-engines and t-router of the Controller, Application, test template page, Controller tests and application config, as this is all now done by the t-engine base package. * The t-router now extends the t-engine base, which also reduced the amount of duplicate code. * The t-engine base provides the test page, which includes drop downs of known transform options. The t-router is able to use pipeline and failover transformers. This was not possible to do previously as the router had no test UI. * Resources including licenses are automatically included in the all-in-one t-engine, from the individual t-engines. They just need to be added as dependencies in the pom. * The ugly code in the all-in-one t-engine and misc t-engine to pick transformers has gone, as they are now just selected by the transformRegistry. * The way t-engines respond to http or message queue transform requests has been combined (eliminates the similar but different code that existed before). * The t-engine base now uses InputStream and OutputStream rather than Files by default. As a result it will be simpler to avoid writing content to a temporary location. * A number of the Tika and Misc CustomTransforms no longer use Files. * The original t-engine base still exists so customers can continue to create custom t-engines the way they have done previously. the project has just been moved into a folder called deprecated. * The folder structure has changed. The long "alfresco-transform-..." names have given way to shorter easier to read and type names. * The t-engine project structure now has a single project rather than two. * The previous config values still exist, but there are now a new set for config values for in files with names that don't misleadingly imply they only contain pipeline of routing information. * The concept of 'routing' has much less emphasis in class names as the code just uses the transformRegistry. * TransformerConfig may now be read as json or yaml. The restrictions about what could be specified in yaml has gone. * T-engines and t-router may use transform config from files. Previously it was just the t-router. * The POC code to do with graphs of possible routes has been removed. * All master branch changes have been merged in. * The concept of a single transform request which results in multiple responses (e.g. images from a video) has been added to the core processing of requests in the t-engine base. * Many SonarCloud linter fixes.
311 lines
12 KiB
Markdown
311 lines
12 KiB
Markdown
# T-Engine configuration
|
|
|
|
Each t-engine provides an endpoint that returns t-config that defines what
|
|
it supports. The t-router and t-engines may also have external t-config files.
|
|
These are combined in name order. As sorting is alphanumeric, you may wish to
|
|
consider using a fixed length numeric prefix in filenames and t-engine names. As will be seen
|
|
t-config may reference elements from other components or modify elements
|
|
from earlier t-config.
|
|
|
|
Current configuration files are:
|
|
* [Pdf-Renderer T-Engine configuration](https://github.com/Alfresco/alfresco-transform-core/blob/master/engines/pdfrenderer/src/main/resources/pdfrenderer_engine_config.json).
|
|
* [ImageMagick T-Engine configuration](https://github.com/Alfresco/alfresco-transform-core/blob/master/engines/imagemagick/src/main/resources/imagemagick_engine_config.json).
|
|
* [Libreoffice T-Engine configuration](https://github.com/Alfresco/alfresco-transform-core/blob/master/engines/libreoffice/src/main/resources/libreoffice_engine_config.json).
|
|
* [Tika T-Engine configuration](https://github.com/Alfresco/alfresco-transform-core/blob/master/engines/tika/src/main/resources/tika_engine_config.json).
|
|
* [Misc T-Engine configuration](https://github.com/Alfresco/alfresco-transform-core/blob/master/engines/misc/src/main/resources/misc_engine_config.json).
|
|
|
|
Additional config files (which may be resources on the classpath or external
|
|
files) are specified in Spring Boot properties or such as
|
|
`transform.config.file.<filename>` or environment variables like
|
|
`TRANSFORM_CONFIG_FILE_<filename>`.
|
|
|
|
The following is a simple t-config file from an example Hello World
|
|
t-engine.
|
|
|
|
~~~json
|
|
{
|
|
"transformOptions":
|
|
{
|
|
"helloWorldOptions":
|
|
[
|
|
{"value": {"name": "language"}}
|
|
]
|
|
},
|
|
"transformers":
|
|
[
|
|
{
|
|
"transformerName": "helloWorld",
|
|
"supportedSourceAndTargetList":
|
|
[
|
|
{"sourceMediaType": "text/plain", "maxSourceSizeBytes": 50, "targetMediaType": "text/html" }
|
|
],
|
|
"transformOptions":
|
|
[
|
|
"helloWorldOptions"
|
|
]
|
|
}
|
|
]
|
|
}
|
|
~~~
|
|
|
|
* **transformOptions** provides a list of transform options (each with its own
|
|
name) that may be referenced for use in different transformers. This way
|
|
common options don't need to be repeated for each transformer. They can
|
|
even be shared between T-Engines. In this example there is only one group
|
|
of options called `helloWorldOptions`, which has just one option the
|
|
`language`. Unless an option has a `"required": true` field it is considered
|
|
to be optional. You don't need to specify _sourceMimetype, sourceExtension,
|
|
sourceEncoding, targetMimetype, targetExtension_ or _timeout_ as options as
|
|
these are available to all transformers.
|
|
* **transformers** is a list of transformer definitions. Each transformer
|
|
definition should have a unique `transformerName`, specify a
|
|
`supportedSourceAndTargetList` and indicate which options it supports.
|
|
In this case there is only one transformer called `Hello World` and it
|
|
accepts `helloWorldOptions`. A transformer may specify references to 0
|
|
or more transformOptions.
|
|
* **supportedSourceAndTargetList** is simply a list of source and target
|
|
Media Types that may be transformed, optionally specifying
|
|
`maxSourceSizeBytes` and `priority` values. In this case there is only one
|
|
from text to HTML and we have limited the source file size, to avoid
|
|
transforming files that clearly don't contain names.
|
|
|
|
## Transform pipelines
|
|
|
|
Transforms may be combined in a pipeline to form a new transformer, where
|
|
the output from one becomes the input to the next and so on. The t-config
|
|
defines the sequence of transform steps and intermediate Media Types. Like
|
|
any other transformer, it specifies a list of supported source and target
|
|
Media Types. If you don't supply any, all possible combinations are assumed
|
|
to be available. The definition may reuse the `transformOptions` of
|
|
transformers in the pipeline, but typically will define its own subset
|
|
of these.
|
|
|
|
The following example begins with the `helloWorld` Transformer, which takes a
|
|
text file containing a name and produces an HTML file with `Hello <name>`
|
|
message in the body. This is then transformed back into a text file. This
|
|
example contains just one pipeline transformer, but many may be defined
|
|
in the same file.
|
|
|
|
~~~json
|
|
{
|
|
"transformers": [
|
|
{
|
|
"transformerName": "helloWorldText",
|
|
"transformerPipeline" : [
|
|
{"transformerName": "helloWorld", "targetMediaType": "text/html"},
|
|
{"transformerName": "html"}
|
|
],
|
|
"supportedSourceAndTargetList": [
|
|
{"sourceMediaType": "text/plain", "priority": 45, "targetMediaType": "text/plain" }
|
|
],
|
|
"transformOptions": [
|
|
"helloWorldOptions"
|
|
]
|
|
}
|
|
]
|
|
}
|
|
~~~
|
|
|
|
* **transformerName** Try to create a unique name for the transform.
|
|
* **transformerPipeline** A list of transformers in the pipeline. The
|
|
`targetMediaType` specifies the intermediate Media Types between
|
|
transformers. There is no final `targetMediaType` as this comes from the
|
|
`supportedSourceAndTargetList`. The `transformerName` may reference a
|
|
transformer that has not been defined yet. A warning is issued if
|
|
it remains undefined after all t-config has been combined. Generally
|
|
it is better for a t-engine rather than the t-router to define pipeline
|
|
transformers as this limits the number of places that have to be changed.
|
|
Normally it is obvious which t-engine should contain the definition.
|
|
* **supportedSourceAndTargetList** The supported source and target Media
|
|
Types, which refer to the Media Types this pipeline transformer can
|
|
transform from and to, additionally you can set the `priority` and the
|
|
`maxSourceSizeBytes`. If blank, this indicates that all possible
|
|
combinations are supported. This is the cartesian product of all source
|
|
types to the first intermediate type and all target types from the last
|
|
intermediate type. Any combinations supported by the first transformer
|
|
are excluded. They will also have the priority from the first transform.
|
|
* **transformOptions** A list of references to options required by the
|
|
pipeline transformer.
|
|
|
|
## Failover transforms
|
|
|
|
A failover transform, simply provides a list of transforms to be attempted
|
|
one after another until one succeeds. For example, you may have a fast
|
|
transform that is able to handle a limited set of transforms and another
|
|
that is slower but handles all cases.
|
|
|
|
~~~json
|
|
{
|
|
"transformers": [
|
|
{
|
|
"transformerName": "imgExtractOrImgCreate",
|
|
"transformerFailover" : [ "imgExtract", "imgCreate" ],
|
|
"supportedSourceAndTargetList": [
|
|
{"sourceMediaType": "application/vnd.oasis.opendocument.graphics", "priority": 150, "targetMediaType": "image/png" },
|
|
...
|
|
{"sourceMediaType": "application/vnd.sun.xml.calc.template", "priority": 150, "targetMediaType": "image/png" }
|
|
]
|
|
}
|
|
]
|
|
}
|
|
~~~
|
|
|
|
* **transformerName** Try to create a unique name for the transform.
|
|
* **transformerFaillover** A list of transformers to try. This may include
|
|
references to transformer that have not been defined yet. Generally it
|
|
is better for the t-engine rather than the t-router to define failover
|
|
transformers as this limits the number of places that have to be changed.
|
|
Normally it is obvious which t-engine should contain the definition.
|
|
* **supportedSourceAndTargetList** The supported source and target Media
|
|
Types, which refer to the Media Types this failover transformer can
|
|
transform from and to, additionally you can set the `priority` and the
|
|
`maxSourceSizeBytes`. Unlike pipelines, it must not be blank.
|
|
* **transformOptions** A list of references to options required by the
|
|
pipeline transformer.
|
|
|
|
## Overriding transforms
|
|
|
|
It is possible to override a previously defined transform definition. The
|
|
following example removes most of the supported source to target media
|
|
types from the standard `"libreoffice"` transform. It also changes the
|
|
max size and priority of others. This is not something you would normally
|
|
want to do.
|
|
|
|
~~~json
|
|
{
|
|
"transformers": [
|
|
{
|
|
"transformerName": "libreoffice",
|
|
"supportedSourceAndTargetList": [
|
|
{"sourceMediaType": "text/csv", "maxSourceSizeBytes": 1000, "targetMediaType": "text/html" },
|
|
{"sourceMediaType": "text/csv", "targetMediaType": "application/vnd.oasis.opendocument.spreadsheet" },
|
|
{"sourceMediaType": "text/csv", "targetMediaType": "application/vnd.oasis.opendocument.spreadsheet-template" },
|
|
{"sourceMediaType": "text/csv", "targetMediaType": "text/tab-separated-values" },
|
|
{"sourceMediaType": "text/csv", "priority": 45, "targetMediaType": "application/vnd.ms-excel" },
|
|
{"sourceMediaType": "text/csv", "priority": 155, "targetMediaType": "application/pdf" }
|
|
]
|
|
}
|
|
]
|
|
}
|
|
~~~
|
|
|
|
## Removing a transformer
|
|
|
|
To discard a previous transformer definition include its name in the
|
|
optional `"removeTransformers"` list. You might want to do this if you
|
|
have a replacement and wish keep the overall configuration simple (so it
|
|
contains no alternatives), or you wish to temporarily remove it. The
|
|
following example removes two transformers before processing any other
|
|
configuration in the same T-Engine or pipeline file.
|
|
|
|
~~~json
|
|
{
|
|
"removeTransformers" : [
|
|
"libreoffice",
|
|
"Archive"
|
|
]
|
|
...
|
|
}
|
|
~~~
|
|
|
|
## Overriding the supportedSourceAndTargetList
|
|
|
|
Rather than totally override an existing transform definition, it is
|
|
generally simpler to modify the `"supportedSourceAndTargetList"` by adding
|
|
elements to the optional `"addSupported"`, `"removeSupported"` and
|
|
`"overrideSupported"` lists. You will need to specify the
|
|
`"transformerName"` but you will not need to repeat all the other
|
|
`"supportedSourceAndTargetList"` values, which means if there are changes
|
|
in the original, the same change is not needed in a second place. The
|
|
following example adds one transform, removes two others and changes
|
|
the `"priority"` and `"maxSourceSizeBytes"` of another. This is done before
|
|
processing any other configuration in the same T-Engine or pipeline file.
|
|
|
|
~~~json
|
|
{
|
|
"addSupported": [
|
|
{
|
|
"transformerName": "Archive",
|
|
"sourceMediaType": "application/zip",
|
|
"targetMediaType": "text/csv",
|
|
"priority": 60,
|
|
"maxSourceSizeBytes": 18874368
|
|
}
|
|
],
|
|
"removeSupported": [
|
|
{
|
|
"transformerName": "Archive",
|
|
"sourceMediaType": "application/zip",
|
|
"targetMediaType": "text/xml"
|
|
},
|
|
{
|
|
"transformerName": "Archive",
|
|
"sourceMediaType": "application/zip",
|
|
"targetMediaType": "text/plain"
|
|
}
|
|
],
|
|
"overrideSupported": [
|
|
{
|
|
"transformerName": "Archive",
|
|
"sourceMediaType": "application/zip",
|
|
"targetMediaType": "text/html",
|
|
"priority": 60,
|
|
"maxSourceSizeBytes": 18874368
|
|
}
|
|
]
|
|
...
|
|
}
|
|
~~~
|
|
|
|
## Default maxSourceSizeBytes and priority values
|
|
|
|
When defining `"supportedSourceAndTargetList"` elements the `"priority"`
|
|
and `"maxSourceSizeBytes"` are optional and normally have the default
|
|
values of 50 and -1 (no limit). It is possible to change those defaults.
|
|
In precedence order from most specific to most general these are defined
|
|
by combinations of `"transformerName"` and `"sourceMediaType"`.
|
|
|
|
* **transformer and source media type default** both specified
|
|
* **transformer** default only the transformer name is specified
|
|
* **source media type default** only the source media type is specified
|
|
* **system wide default** neither are specified.
|
|
|
|
Both `"priority"` and `"maxSourceSizeBytes"` may be specified in an element,
|
|
but if only one is specified it is only that value that is being defaulted.
|
|
|
|
Being able to change the defaults is particularly useful once a T-Engine
|
|
has been developed as it allows a system administrator to handle
|
|
limitations that are only found later. The `system wide defaults` are
|
|
generally not used but are included for completeness. The following
|
|
example says that the `"Office"` transformer by default should only handle
|
|
zip files up to 18 Mb and by default the maximum size of a `.doc` file to be
|
|
transformed is 4 Mb. The third example defaults the priority, possibly
|
|
allowing another transformer that has specified a priority of say `50` to
|
|
be used in preference.
|
|
|
|
Defaults values are only applied after all t-config has been read.
|
|
|
|
~~~json
|
|
{
|
|
"supportedDefaults": [
|
|
{
|
|
"transformerName": "Office", // default for a source type within a transformer
|
|
"sourceMediaType": "application/zip",
|
|
"maxSourceSizeBytes": 18874368
|
|
},
|
|
{
|
|
"sourceMediaType": "application/msword", // defaults for a source type
|
|
"maxSourceSizeBytes": 4194304,
|
|
"priority": 45
|
|
},
|
|
{
|
|
"priority": 60 // system wide default
|
|
},
|
|
{
|
|
"maxSourceSizeBytes": -1 // system wide default
|
|
}
|
|
]
|
|
...
|
|
}
|
|
~~~
|