alfresco-transform-core/docs/transformer-selection.md
Alan Davis babe26b0ba
HXENG-64 refactor ATS (#657)
Refactor to clean up packages in the t-model and to introduce a simpler to implement t-engine base.

The new t-engines (tika, imagemagick, libreoffice, pdfrenderer, misc, aio, aspose) and t-router may be used in combination with older components as the API between the content Repo and between components has not changed. As far as possible the same artifacts are created (the -boot projects no longer exist). They may be used with older ACS repo versions.

The main changes to look for are:
* The introduction of TransformEngine and CustomTransformer interfaces to be implemented.
* The removal in t-engines and t-router of the Controller, Application, test template page, Controller tests and application config, as this is all now done by the t-engine base package.
* The t-router now extends the t-engine base, which also reduced the amount of duplicate code.
* The t-engine base provides the test page, which includes drop downs of known transform options. The t-router is able to use pipeline and failover transformers. This was not possible to do previously as the router had no test UI.
* Resources including licenses are automatically included in the all-in-one t-engine, from the individual t-engines. They just need to be added as dependencies in the pom. 
* The ugly code in the all-in-one t-engine and misc t-engine to pick transformers has gone, as they are now just selected by the transformRegistry.
* The way t-engines respond to http or message queue transform requests has been combined (eliminates the similar but different code that existed before).
* The t-engine base now uses InputStream and OutputStream rather than Files by default. As a result it will be simpler to avoid writing content to a temporary location.
* A number of the Tika and Misc CustomTransforms no longer use Files.
* The original t-engine base still exists so customers can continue to create custom t-engines the way they have done previously. the project has just been moved into a folder called deprecated.
* The folder structure has changed. The long "alfresco-transform-..." names have given way to shorter easier to read and type names.
* The t-engine project structure now has a single project rather than two. 
* The previous config values still exist, but there are now a new set for config values for in files with names that don't misleadingly imply they only contain pipeline of routing information. 
* The concept of 'routing' has much less emphasis in class names as the code just uses the transformRegistry. 
* TransformerConfig may now be read as json or yaml. The restrictions about what could be specified in yaml has gone.
* T-engines and t-router may use transform config from files. Previously it was just the t-router.
* The POC code to do with graphs of possible routes has been removed.
* All master branch changes have been merged in.
* The concept of a single transform request which results in multiple responses (e.g. images from a video) has been added to the core processing of requests in the t-engine base.
* Many SonarCloud linter fixes.
2022-09-14 13:40:19 +01:00

1.3 KiB

Transformer selection strategy

The TransformRegistry uses t-config to choose which Transformer will be used. A transformer definition contains a supported list of source and target Media Types. This is used for the most basic selection. It is further refined by checking that the definition also supports transform options (the parameters) that have been supplied in a transform request.

Transformer 1 defines options: Op1, Op2
Transformer 2 defines options: Op1, Op2, Op3, Op4

Transform request provides values for options: Op2, Op3

If we assume both transformers support the required source and target Media Types, Transformer 2 will be selected because it knows about all the supplied options. The definition may also specify that some options are required or grouped. If any members of an optional group are supplied, all required members of that group become required.

The configuration may impose a source file size limit resulting in the selection of a different transformer. Size limits are normally added to avoid the transforms consuming too many resources.

The configuration may also specify a priority which will be used in Transformer selection if there are a number of possible transformers. The highest priority is the one with the lowest number.