Refactor to clean up packages in the t-model and to introduce a simpler to implement t-engine base. The new t-engines (tika, imagemagick, libreoffice, pdfrenderer, misc, aio, aspose) and t-router may be used in combination with older components as the API between the content Repo and between components has not changed. As far as possible the same artifacts are created (the -boot projects no longer exist). They may be used with older ACS repo versions. The main changes to look for are: * The introduction of TransformEngine and CustomTransformer interfaces to be implemented. * The removal in t-engines and t-router of the Controller, Application, test template page, Controller tests and application config, as this is all now done by the t-engine base package. * The t-router now extends the t-engine base, which also reduced the amount of duplicate code. * The t-engine base provides the test page, which includes drop downs of known transform options. The t-router is able to use pipeline and failover transformers. This was not possible to do previously as the router had no test UI. * Resources including licenses are automatically included in the all-in-one t-engine, from the individual t-engines. They just need to be added as dependencies in the pom. * The ugly code in the all-in-one t-engine and misc t-engine to pick transformers has gone, as they are now just selected by the transformRegistry. * The way t-engines respond to http or message queue transform requests has been combined (eliminates the similar but different code that existed before). * The t-engine base now uses InputStream and OutputStream rather than Files by default. As a result it will be simpler to avoid writing content to a temporary location. * A number of the Tika and Misc CustomTransforms no longer use Files. * The original t-engine base still exists so customers can continue to create custom t-engines the way they have done previously. the project has just been moved into a folder called deprecated. * The folder structure has changed. The long "alfresco-transform-..." names have given way to shorter easier to read and type names. * The t-engine project structure now has a single project rather than two. * The previous config values still exist, but there are now a new set for config values for in files with names that don't misleadingly imply they only contain pipeline of routing information. * The concept of 'routing' has much less emphasis in class names as the code just uses the transformRegistry. * TransformerConfig may now be read as json or yaml. The restrictions about what could be specified in yaml has gone. * T-engines and t-router may use transform config from files. Previously it was just the t-router. * The POC code to do with graphs of possible routes has been removed. * All master branch changes have been merged in. * The concept of a single transform request which results in multiple responses (e.g. images from a video) has been added to the core processing of requests in the t-engine base. * Many SonarCloud linter fixes.
12 KiB
T-Engine configuration
Each t-engine provides an endpoint that returns t-config that defines what it supports. The t-router and t-engines may also have external t-config files. These are combined in name order. As sorting is alphanumeric, you may wish to consider using a fixed length numeric prefix in filenames and t-engine names. As will be seen t-config may reference elements from other components or modify elements from earlier t-config.
Current configuration files are:
- Pdf-Renderer T-Engine configuration.
- ImageMagick T-Engine configuration.
- Libreoffice T-Engine configuration.
- Tika T-Engine configuration.
- Misc T-Engine configuration.
Additional config files (which may be resources on the classpath or external
files) are specified in Spring Boot properties or such as
transform.config.file.<filename>
or environment variables like
TRANSFORM_CONFIG_FILE_<filename>
.
The following is a simple t-config file from an example Hello World t-engine.
{
"transformOptions":
{
"helloWorldOptions":
[
{"value": {"name": "language"}}
]
},
"transformers":
[
{
"transformerName": "helloWorld",
"supportedSourceAndTargetList":
[
{"sourceMediaType": "text/plain", "maxSourceSizeBytes": 50, "targetMediaType": "text/html" }
],
"transformOptions":
[
"helloWorldOptions"
]
}
]
}
- transformOptions provides a list of transform options (each with its own
name) that may be referenced for use in different transformers. This way
common options don't need to be repeated for each transformer. They can
even be shared between T-Engines. In this example there is only one group
of options called
helloWorldOptions
, which has just one option thelanguage
. Unless an option has a"required": true
field it is considered to be optional. You don't need to specify sourceMimetype, sourceExtension, sourceEncoding, targetMimetype, targetExtension or timeout as options as these are available to all transformers. - transformers is a list of transformer definitions. Each transformer
definition should have a unique
transformerName
, specify asupportedSourceAndTargetList
and indicate which options it supports. In this case there is only one transformer calledHello World
and it acceptshelloWorldOptions
. A transformer may specify references to 0 or more transformOptions. - supportedSourceAndTargetList is simply a list of source and target
Media Types that may be transformed, optionally specifying
maxSourceSizeBytes
andpriority
values. In this case there is only one from text to HTML and we have limited the source file size, to avoid transforming files that clearly don't contain names.
Transform pipelines
Transforms may be combined in a pipeline to form a new transformer, where
the output from one becomes the input to the next and so on. The t-config
defines the sequence of transform steps and intermediate Media Types. Like
any other transformer, it specifies a list of supported source and target
Media Types. If you don't supply any, all possible combinations are assumed
to be available. The definition may reuse the transformOptions
of
transformers in the pipeline, but typically will define its own subset
of these.
The following example begins with the helloWorld
Transformer, which takes a
text file containing a name and produces an HTML file with Hello <name>
message in the body. This is then transformed back into a text file. This
example contains just one pipeline transformer, but many may be defined
in the same file.
{
"transformers": [
{
"transformerName": "helloWorldText",
"transformerPipeline" : [
{"transformerName": "helloWorld", "targetMediaType": "text/html"},
{"transformerName": "html"}
],
"supportedSourceAndTargetList": [
{"sourceMediaType": "text/plain", "priority": 45, "targetMediaType": "text/plain" }
],
"transformOptions": [
"helloWorldOptions"
]
}
]
}
- transformerName Try to create a unique name for the transform.
- transformerPipeline A list of transformers in the pipeline. The
targetMediaType
specifies the intermediate Media Types between transformers. There is no finaltargetMediaType
as this comes from thesupportedSourceAndTargetList
. ThetransformerName
may reference a transformer that has not been defined yet. A warning is issued if it remains undefined after all t-config has been combined. Generally it is better for a t-engine rather than the t-router to define pipeline transformers as this limits the number of places that have to be changed. Normally it is obvious which t-engine should contain the definition. - supportedSourceAndTargetList The supported source and target Media
Types, which refer to the Media Types this pipeline transformer can
transform from and to, additionally you can set the
priority
and themaxSourceSizeBytes
. If blank, this indicates that all possible combinations are supported. This is the cartesian product of all source types to the first intermediate type and all target types from the last intermediate type. Any combinations supported by the first transformer are excluded. They will also have the priority from the first transform. - transformOptions A list of references to options required by the pipeline transformer.
Failover transforms
A failover transform, simply provides a list of transforms to be attempted one after another until one succeeds. For example, you may have a fast transform that is able to handle a limited set of transforms and another that is slower but handles all cases.
{
"transformers": [
{
"transformerName": "imgExtractOrImgCreate",
"transformerFailover" : [ "imgExtract", "imgCreate" ],
"supportedSourceAndTargetList": [
{"sourceMediaType": "application/vnd.oasis.opendocument.graphics", "priority": 150, "targetMediaType": "image/png" },
...
{"sourceMediaType": "application/vnd.sun.xml.calc.template", "priority": 150, "targetMediaType": "image/png" }
]
}
]
}
- transformerName Try to create a unique name for the transform.
- transformerFaillover A list of transformers to try. This may include references to transformer that have not been defined yet. Generally it is better for the t-engine rather than the t-router to define failover transformers as this limits the number of places that have to be changed. Normally it is obvious which t-engine should contain the definition.
- supportedSourceAndTargetList The supported source and target Media
Types, which refer to the Media Types this failover transformer can
transform from and to, additionally you can set the
priority
and themaxSourceSizeBytes
. Unlike pipelines, it must not be blank. - transformOptions A list of references to options required by the pipeline transformer.
Overriding transforms
It is possible to override a previously defined transform definition. The
following example removes most of the supported source to target media
types from the standard "libreoffice"
transform. It also changes the
max size and priority of others. This is not something you would normally
want to do.
{
"transformers": [
{
"transformerName": "libreoffice",
"supportedSourceAndTargetList": [
{"sourceMediaType": "text/csv", "maxSourceSizeBytes": 1000, "targetMediaType": "text/html" },
{"sourceMediaType": "text/csv", "targetMediaType": "application/vnd.oasis.opendocument.spreadsheet" },
{"sourceMediaType": "text/csv", "targetMediaType": "application/vnd.oasis.opendocument.spreadsheet-template" },
{"sourceMediaType": "text/csv", "targetMediaType": "text/tab-separated-values" },
{"sourceMediaType": "text/csv", "priority": 45, "targetMediaType": "application/vnd.ms-excel" },
{"sourceMediaType": "text/csv", "priority": 155, "targetMediaType": "application/pdf" }
]
}
]
}
Removing a transformer
To discard a previous transformer definition include its name in the
optional "removeTransformers"
list. You might want to do this if you
have a replacement and wish keep the overall configuration simple (so it
contains no alternatives), or you wish to temporarily remove it. The
following example removes two transformers before processing any other
configuration in the same T-Engine or pipeline file.
{
"removeTransformers" : [
"libreoffice",
"Archive"
]
...
}
Overriding the supportedSourceAndTargetList
Rather than totally override an existing transform definition, it is
generally simpler to modify the "supportedSourceAndTargetList"
by adding
elements to the optional "addSupported"
, "removeSupported"
and
"overrideSupported"
lists. You will need to specify the
"transformerName"
but you will not need to repeat all the other
"supportedSourceAndTargetList"
values, which means if there are changes
in the original, the same change is not needed in a second place. The
following example adds one transform, removes two others and changes
the "priority"
and "maxSourceSizeBytes"
of another. This is done before
processing any other configuration in the same T-Engine or pipeline file.
{
"addSupported": [
{
"transformerName": "Archive",
"sourceMediaType": "application/zip",
"targetMediaType": "text/csv",
"priority": 60,
"maxSourceSizeBytes": 18874368
}
],
"removeSupported": [
{
"transformerName": "Archive",
"sourceMediaType": "application/zip",
"targetMediaType": "text/xml"
},
{
"transformerName": "Archive",
"sourceMediaType": "application/zip",
"targetMediaType": "text/plain"
}
],
"overrideSupported": [
{
"transformerName": "Archive",
"sourceMediaType": "application/zip",
"targetMediaType": "text/html",
"priority": 60,
"maxSourceSizeBytes": 18874368
}
]
...
}
Default maxSourceSizeBytes and priority values
When defining "supportedSourceAndTargetList"
elements the "priority"
and "maxSourceSizeBytes"
are optional and normally have the default
values of 50 and -1 (no limit). It is possible to change those defaults.
In precedence order from most specific to most general these are defined
by combinations of "transformerName"
and "sourceMediaType"
.
- transformer and source media type default both specified
- transformer default only the transformer name is specified
- source media type default only the source media type is specified
- system wide default neither are specified.
Both "priority"
and "maxSourceSizeBytes"
may be specified in an element,
but if only one is specified it is only that value that is being defaulted.
Being able to change the defaults is particularly useful once a T-Engine
has been developed as it allows a system administrator to handle
limitations that are only found later. The system wide defaults
are
generally not used but are included for completeness. The following
example says that the "Office"
transformer by default should only handle
zip files up to 18 Mb and by default the maximum size of a .doc
file to be
transformed is 4 Mb. The third example defaults the priority, possibly
allowing another transformer that has specified a priority of say 50
to
be used in preference.
Defaults values are only applied after all t-config has been read.
{
"supportedDefaults": [
{
"transformerName": "Office", // default for a source type within a transformer
"sourceMediaType": "application/zip",
"maxSourceSizeBytes": 18874368
},
{
"sourceMediaType": "application/msword", // defaults for a source type
"maxSourceSizeBytes": 4194304,
"priority": 45
},
{
"priority": 60 // system wide default
},
{
"maxSourceSizeBytes": -1 // system wide default
}
]
...
}