From ade4b95a499c0250b6ef508485e84b15cdd58fc0 Mon Sep 17 00:00:00 2001 From: alandavis Date: Wed, 17 Aug 2022 13:33:16 +0100 Subject: [PATCH] Doc changes only [skip ci] --- docs/engine_config.md | 161 --------- docs/t-engines.md | 60 ++++ docs/transform-config.md | 308 ++++++++++++++++++ docs/transform-specific-code.md | 140 ++++++++ docs/transformer-selection.md | 27 ++ docs/transformerDebug.md | 46 +++ .../base/registry/TransformConfigFiles.java | 4 +- 7 files changed, 583 insertions(+), 163 deletions(-) delete mode 100644 docs/engine_config.md create mode 100644 docs/t-engines.md create mode 100644 docs/transform-config.md create mode 100644 docs/transform-specific-code.md create mode 100644 docs/transformer-selection.md create mode 100644 docs/transformerDebug.md diff --git a/docs/engine_config.md b/docs/engine_config.md deleted file mode 100644 index 9d49ccde..00000000 --- a/docs/engine_config.md +++ /dev/null @@ -1,161 +0,0 @@ -## T-Engine configuration - -T-Engines provide a */transform/config* end point for clients (e.g. Transform-Router or -Repository) that indicate what is supported. T-Engines store this -configuration as a JSON resource file named *engine_config.json*. - -The config can be found under `alfresco-transform-core/engines//src/main/resources -/_engine_config.json`; current configuration files are: -* [Pdf-Renderer T-Engine configuration](https://github.com/Alfresco/alfresco-transform-core/blob/master/engines/pdfrenderer/src/main/resources/pdfrenderer_engine_config.json). -* [ImageMagick T-Engine configuration](https://github.com/Alfresco/alfresco-transform-core/blob/master/engines/imagemagick/src/main/resources/imagemagick_engine_config.json). -* [Libreoffice T-Engine configuration](https://github.com/Alfresco/alfresco-transform-core/blob/master/engines/libreoffice/src/main/resources/libreoffice_engine_config.json). -* [Tika T-Engine configuration](https://github.com/Alfresco/alfresco-transform-core/blob/master/engines/tika/src/main/resources/tika_engine_config.json). -* [Misc T-Engine configuration](https://github.com/Alfresco/alfresco-transform-core/blob/master/engines/misc/src/main/resources/misc_engine_config.json). - -*Snippet from Tika T-engine configuration:* -```json -{ - "transformOptions": { - "tikaOptions": [ - {"value": {"name": "targetEncoding"}} - ], - "pdfboxOptions": [ - {"value": {"name": "notExtractBookmarksText"}}, - {"value": {"name": "targetEncoding"}} - ] - }, - "transformers": [ - { - "transformerName": "PdfBox", - "supportedSourceAndTargetList": [ - {"sourceMediaType": "application/pdf", "targetMediaType": "text/html"}, - {"sourceMediaType": "application/pdf", "maxSourceSizeBytes": 26214400, "targetMediaType": "text/plain"} - ], - "transformOptions": [ - "pdfboxOptions" - ] - }, - { - "transformerName": "TikaAuto", - "supportedSourceAndTargetList": [ - {"sourceMediaType": "application/msword", "priority": 55, "targetMediaType": "text/xml"} - ], - "transformOptions": [ - "tikaOptions" - ] - }, - { - "transformerName": "TextMining", - "supportedSourceAndTargetList": [ - {"sourceMediaType": "application/msword", "targetMediaType": "text/xml"} - ], - "transformOptions": [ - "tikaOptions" - ] - } - ] -} -``` - -### Transform Options -* **transformOptions** provides a list of transform options that may be - referenced for use in different transformers. This way common options - don't need to be repeated for each transformer, they can be shared between - T-Engines. In this example there are two groups of options called **tikaOptions** - and **pdfboxOptions** which has a group of options **targetEncoding** and - **notExtractBookmarksText**. Unless an option has a **"required": true** field it is - considered to be optional. - - *Snippet from ImageMagick T-engine configuration:* -```json - "transformOptions": { - "imageMagickOptions": [ - {"value": {"name": "alphaRemove"}}, - {"group": {"transformOptions": [ - {"value": {"name": "cropGravity"}}, - {"value": {"name": "cropWidth"}}, - {"value": {"name": "cropHeight"}}, - {"value": {"name": "cropPercentage"}}, - {"value": {"name": "cropXOffset"}}, - {"value": {"name": "cropYOffset"}} - ]}}, - ] - }, -``` -* There are two types of transformOptions, *transformOptionsValue* and *transformOptionsGroup*: - * _TransformOptionsValue_ is used to represent a single transformation option, it is defined - by a **name** and an optional **required** field. - * _TransformOptionGroup_ represents a group of one or more options, it is used to group - options that define a - characteristic. In the above snippet all the options for crop are defined under a group, it is recommended to - use this approach as it is easier to read. A transformOptionsGroup can contain one or more transformOptionsValue - and transformOptionsGroup. - -### Transformers -* **transformers** - A list of transformer definitions. - Each transformer definition should have a unique **transformerName**, - specify a **supportedSourceAndTargetList** and indicate which - options it supports. As it is shown in the Tika snippet, an *engine_config* - can describe one or more transformers, as a T-engine can have - multiple transformers (e.g. Tika, Misc). A transformer configuration may - specify references to 0 or more transformOptions. - -### Supported Source and Target List -* **supportedSourceAndTargetList** is simply a list of source and target - Media Types that may be transformed, optionally specifying a - **maxSourceSizeBytes** and a **priority** value. -* *maxSourceSizeBytes* is used to set the upper size limit of a transformation. - * If not specified, the default value for maxSourceSizeBytes is **unlimited**. -* *priority* it is used by clients to determine which transfomer to call or by T-engines - with multiple transformers to determine which one to use. In the above Tika snippet, - both *TikaAuto* and *TextMining* have the capability to transform *"application/msword"* - into *"text/xml"*, the transformer containing the source-target media type with higher priority will be chosen by the - T-engine as the one to execute the transformation, in this case it will be *TextMining*, because: - * If not specified, the default value for priority is **50**. - * Note: priority values are like the order in a queue, the **lower** the number the **higher the - priority** is. - -## Transformer selection strategy -The T-Engine configuration is used to choose which T-Engine will perform a transform. -A transformer definition contains a supported list of source and target Media Types. This is used for the -most basic selection. This is further refined by checking that the definition also supports transform options -(parameters) that have been supplied in a transform request or a Rendition Definition used in a rendition request. -Order for selection is: -1. Source->Target Media Types -2. transformOptions -3. maxSourceSizeBytes -4. priority - -#### Case 1: -``` -Transformer 1 defines options: Op1, Op2 -Transformer 2 defines options: Op1, Op2, Op3, Op4 -``` -``` -Rendition provides values for options: Op2, Op3 -``` -If we assume both transformers support the required source and target Media Types, Transformer 2 will be selected -because it knows about all the supplied options. The definition may also specify that some options are required or grouped. - -#### Case 2: -``` -Transformer 1 defines options: Op1, Op2, maxSize -Transformer 2 defines options: Op1, Op2, Op3 -``` -``` -Rendition provides values for options: Op1, Op2 -``` -If we assume both transformers support the required source and target Media Types, and file size is greater than *maxSize* -,Transformer 2 will be selected because if can handle *maxSourceSizeBytes* for this transformation. - -#### Case 3: -``` -Transformer 1 defines options: Op1, Op2, priorty1 -Transformer 2 defines options: Op1, Op2, Op3, priority2 -``` -``` -Rendition provides values for options: Op1, Op2 -``` -If we assume both transformers support the required source and target Media Types and - *priority1* < *priority2*, Transformer 1 will be selected because its priority is higher. - \ No newline at end of file diff --git a/docs/t-engines.md b/docs/t-engines.md new file mode 100644 index 00000000..e06e0fe3 --- /dev/null +++ b/docs/t-engines.md @@ -0,0 +1,60 @@ +## T-Engines + +The t-engines provide the basic transform operations. The Transform Service +provides a common base for the communication with other components. It is +this base that is described in this section. The base is a Spring Boot +application to which transform specific code is added and then wrapped +in a Docker image with any programs that the transforms need. The base +does not need to be used as long as there appears to be a process responding +endpoints and messages. + +A t-engine groups together one of more Transformers. Each Transformer +(provided by transform specific code) knows how to perform a set of +transformations from one MIME Type to another with a common set of +t-options. + +~~~ +0010 my-t-engine + Transformer 1 + mimetype A -> mimetype B + mimetype A -> mimetype C + mimetype B -> mimetype C + option1 + option2 + Transformer 2 + mimetype A -> mimetype B + mimetype D -> mimetype C + option2 + option3 +0020 another-t-engine + ... +0030 yet-another-t-engine + ... +~~~ + +### Endpoints + +* `POST /transform` to perform a transform. There are two forms: + * For asynchronous transforms: Perform a transform using a + `TransformRequest` received from the t-router via a message queue. The + `TransformReply` is sent back via the queue. + * For synchronous transforms: Performs a transform on content uploaded as + a Multipart File and provides the resulting content as a download. + Transform options are extracted from the request properties. The + following are not added as transform options, but are used to select the + transformer: `sourceMimetype` & `targetMimetype`. +* `GET /transform/config` to obtain t-config about what the t-engine supports. + It has a parameter `configVersion` to allow a caller and the t-engine to + negotiate down to a common format. The value is an integer which indicate + which elements may to be added to the config. These elements reflect + functionality supported by the base (such as pre-signed URLs). The + `CoreVersionDecorator` adds to the Config returned by the transform + specific code. +* `GET /` provides an html test page to upload a source file, enter transform + options and issue a synchronous transform request. Useful in testing. +* `GET /log` provides a page with basic log information. Useful in testing. +* `GET /error` provides an error page when testing. +* `GET /version` provides a String message to be included in client debug + messages. +* `GET /ready` used by Kubernetes as a ready probe. +* `GET /live` used by Kubernetes as a ready probe. \ No newline at end of file diff --git a/docs/transform-config.md b/docs/transform-config.md new file mode 100644 index 00000000..0227bfbd --- /dev/null +++ b/docs/transform-config.md @@ -0,0 +1,308 @@ +## T-Engine configuration + +Each t-engine provides an endpoint that returns t-config that defines what +it supports. The t-router and t-engines may also have external t-config files. +These are combined in name order. As sorting is alphanumeric, you may wish to +consider using a fixed length numeric prefix in filenames and t-engine names. As will be seen +t-config may reference elements from other components or modify elements +from earlier t-config. + +Current configuration files are: +* [Pdf-Renderer T-Engine configuration](https://github.com/Alfresco/alfresco-transform-core/blob/master/engines/pdfrenderer/src/main/resources/pdfrenderer_engine_config.json). +* [ImageMagick T-Engine configuration](https://github.com/Alfresco/alfresco-transform-core/blob/master/engines/imagemagick/src/main/resources/imagemagick_engine_config.json). +* [Libreoffice T-Engine configuration](https://github.com/Alfresco/alfresco-transform-core/blob/master/engines/libreoffice/src/main/resources/libreoffice_engine_config.json). +* [Tika T-Engine configuration](https://github.com/Alfresco/alfresco-transform-core/blob/master/engines/tika/src/main/resources/tika_engine_config.json). +* [Misc T-Engine configuration](https://github.com/Alfresco/alfresco-transform-core/blob/master/engines/misc/src/main/resources/misc_engine_config.json). + +Additional config files (which may be resources on the classpath or external +files) are specified in Spring Boot properties or such as +`transform.config.file.` or environment variables like +`TRANSFORM_CONFIG_FILE_`. + +The following is a simple t-config file from an example Hello World +t-engine. + +~~~ +{ + "transformOptions": + { + "helloWorldOptions": + [ + {"value": {"name": "language"}} + ] + }, + "transformers": + [ + { + "transformerName": "helloWorld", + "supportedSourceAndTargetList": + [ + {"sourceMediaType": "text/plain", "maxSourceSizeBytes": 50, "targetMediaType": "text/html" } + ], + "transformOptions": + [ + "helloWorldOptions" + ] + } + ] +} +~~~ + +* **transformOptions** provides a list of transform options (each with its own + name) that may be referenced for use in different transformers. This way + common options don't need to be repeated for each transformer. They can + even be shared between T-Engines. In this example there is only one group + of options called `helloWorldOptions`, which has just one option the + `language`. Unless an option has a `"required": true` field it is considered + to be optional. You don't need to specify _sourceMimetype, sourceExtension, + sourceEncoding, targetMimetype, targetExtension_ or _timeout_ as options as + these are available to all transformers. +* **transformers** is a list of transformer definitions. Each transformer + definition should have a unique `transformerName`, specify a + `supportedSourceAndTargetList` and indicate which options it supports. + In this case there is only one transformer called `Hello World` and it + accepts `helloWorldOptions`. A transformer may specify references to 0 + or more transformOptions. +* **supportedSourceAndTargetList** is simply a list of source and target + Media Types that may be transformed, optionally specifying + `maxSourceSizeBytes` and `priority` values. In this case there is only one + from text to HTML and we have limited the source file size, to avoid + transforming files that clearly don't contain names. + +### Transform pipelines + +Transforms may be combined in a pipeline to form a new transformer, where +the output from one becomes the input to the next and so on. The t-config +defines the sequence of transform steps and intermediate Media Types. Like +any other transformer, it specifies a list of supported source and target +Media Types. If you don't supply any, all possible combinations are assumed +to be available. The definition may reuse the `transformOptions` of +transformers in the pipeline, but typically will define its own subset +of these. + +The following example begins with the `helloWorld` Transformer, which takes a +text file containing a name and produces an HTML file with `Hello ` +message in the body. This is then transformed back into a text file. This +example contains just one pipeline transformer, but many may be defined +in the same file. + +~~~ +{ + "transformers": [ + { + "transformerName": "helloWorldText", + "transformerPipeline" : [ + {"transformerName": "helloWorld", "targetMediaType": "text/html"}, + {"transformerName": "html"} + ], + "supportedSourceAndTargetList": [ + {"sourceMediaType": "text/plain", "priority": 45, "targetMediaType": "text/plain" } + ], + "transformOptions": [ + "helloWorldOptions" + ] + } + ] +} +~~~ + +* **transformerName** Try to create a unique name for the transform. +* **transformerPipeline** A list of transformers in the pipeline. The + `targetMediaType` specifies the intermediate Media Types between + transformers. There is no final `targetMediaType` as this comes from the + `supportedSourceAndTargetList`. The `transformerName` may reference a + transformer that has not been defined yet. A warning is issued if + it remains undefined after all t-config has been combined. Generally + it is better for a t-engine rather than the t-router to define pipeline + transformers as this limits the number of places that have to be changed. + Normally it is obvious which t-engine should contain the definition. +* **supportedSourceAndTargetList** The supported source and target Media + Types, which refer to the Media Types this pipeline transformer can + transform from and to, additionally you can set the `priority` and the + `maxSourceSizeBytes`. If blank, this indicates that all possible + combinations are supported. This is the cartesian product of all source + types to the first intermediate type and all target types from the last + intermediate type. Any combinations supported by the first transformer + are excluded. They will also have the priority from the first transform. +* **transformOptions** A list of references to options required by the + pipeline transformer. + +### Failover transforms + +A failover transform, simply provides a list of transforms to be attempted +one after another until one succeeds. For example, you may have a fast +transform that is able to handle a limited set of transforms and another +that is slower but handles all cases. + +~~~ +{ + "transformers": [ + { + "transformerName": "imgExtractOrImgCreate", + "transformerFailover" : [ "imgExtract", "imgCreate" ], + "supportedSourceAndTargetList": [ + {"sourceMediaType": "application/vnd.oasis.opendocument.graphics", "priority": 150, "targetMediaType": "image/png" }, + ... + {"sourceMediaType": "application/vnd.sun.xml.calc.template", "priority": 150, "targetMediaType": "image/png" } + ] + } + ] +} +~~~ + +* **transformerName** Try to create a unique name for the transform. +* **transformerFaillover** A list of transformers to try. This may include + references to transformer that have not been defined yet. Generally it + is better for the t-engine rather than the t-router to define failover + transformers as this limits the number of places that have to be changed. + Normally it is obvious which t-engine should contain the definition. +* **supportedSourceAndTargetList** The supported source and target Media + Types, which refer to the Media Types this failover transformer can + transform from and to, additionally you can set the `priority` and the + `maxSourceSizeBytes`. Unlike pipelines, it must not be blank. +* **transformOptions** A list of references to options required by the + pipeline transformer. + +### Overriding transforms + +It is possible to override a previously defined transform definition. The +following example removes most of the supported source to target media +types from the standard `"libreoffice"` transform. It also changes the +max size and priority of others. This is not something you would normally +want to do. +~~~ +{ + "transformers": [ + { + "transformerName": "libreoffice", + "supportedSourceAndTargetList": [ + {"sourceMediaType": "text/csv", "maxSourceSizeBytes": 1000, "targetMediaType": "text/html" }, + {"sourceMediaType": "text/csv", "targetMediaType": "application/vnd.oasis.opendocument.spreadsheet" }, + {"sourceMediaType": "text/csv", "targetMediaType": "application/vnd.oasis.opendocument.spreadsheet-template" }, + {"sourceMediaType": "text/csv", "targetMediaType": "text/tab-separated-values" }, + {"sourceMediaType": "text/csv", "priority": 45, "targetMediaType": "application/vnd.ms-excel" }, + {"sourceMediaType": "text/csv", "priority": 155, "targetMediaType": "application/pdf" } + ] + } + ] +} +~~~ + +### Removing a transformer + +To discard a previous transformer definition include its name in the +optional `"removeTransformers"` list. You might want to do this if you +have a replacement and wish keep the overall configuration simple (so it +contains no alternatives), or you wish to temporarily remove it. The +following example removes two transformers before processing any other +configuration in the same T-Engine or pipeline file. + +~~~ +{ + "removeTransformers" : [ + "libreoffice", + "Archive" + ] + ... +} +~~~ + +### Overriding the supportedSourceAndTargetList + +Rather than totally override an existing transform definition, it is +generally simpler to modify the `"supportedSourceAndTargetList"` by adding +elements to the optional `"addSupported"`, `"removeSupported"` and +`"overrideSupported"` lists. You will need to specify the +`"transformerName"` but you will not need to repeat all the other +`"supportedSourceAndTargetList"` values, which means if there are changes +in the original, the same change is not needed in a second place. The +following example adds one transform, removes two others and changes +the `"priority"` and `"maxSourceSizeBytes"` of another. This is done before +processing any other configuration in the same T-Engine or pipeline file. +~~~ +{ + "addSupported": [ + { + "transformerName": "Archive", + "sourceMediaType": "application/zip", + "targetMediaType": "text/csv", + "priority": 60, + "maxSourceSizeBytes": 18874368 + } + ], + "removeSupported": [ + { + "transformerName": "Archive", + "sourceMediaType": "application/zip", + "targetMediaType": "text/xml" + }, + { + "transformerName": "Archive", + "sourceMediaType": "application/zip", + "targetMediaType": "text/plain" + } + ], + "overrideSupported": [ + { + "transformerName": "Archive", + "sourceMediaType": "application/zip", + "targetMediaType": "text/html", + "priority": 60, + "maxSourceSizeBytes": 18874368 + } + ] + ... +} +~~~ + +### Default maxSourceSizeBytes and priority values + +When defining `"supportedSourceAndTargetList"` elements the `"priority"` +and `"maxSourceSizeBytes"` are optional and normally have the default +values of 50 and -1 (no limit). It is possible to change those defaults. +In precedence order from most specific to most general these are defined +by combinations of `"transformerName"` and `"sourceMediaType"`. + +* **transformer and source media type default** both specified +* **transformer** default only the transformer name is specified +* **source media type default** only the source media type is specified +* **system wide default** neither are specified. + +Both `"priority"` and `"maxSourceSizeBytes"` may be specified in an element, +but if only one is specified it is only that value that is being defaulted. + +Being able to change the defaults is particularly useful once a T-Engine +has been developed as it allows a system administrator to handle +limitations that are only found later. The `system wide defaults` are +generally not used but are included for completeness. The following +example says that the `"Office"` transformer by default should only handle +zip files up to 18 Mb and by default the maximum size of a `.doc` file to be +transformed is 4 Mb. The third example defaults the priority, possibly +allowing another transformer that has specified a priority of say `50` to +be used in preference. + +Defaults values are only applied after all t-config has been read. + +~~~ +{ + "supportedDefaults": [ + { + "transformerName": "Office", // default for a source type within a transformer + "sourceMediaType": "application/zip", + "maxSourceSizeBytes": 18874368 + }, + { + "sourceMediaType": "application/msword", // defaults for a source type + "maxSourceSizeBytes": 4194304, + "priority": 45 + }, + { + "priority": 60 // system wide default + }, + { + "maxSourceSizeBytes": -1 // system wide default + } + ] + ... +} +~~~ \ No newline at end of file diff --git a/docs/transform-specific-code.md b/docs/transform-specific-code.md new file mode 100644 index 00000000..dfbaa369 --- /dev/null +++ b/docs/transform-specific-code.md @@ -0,0 +1,140 @@ +## Transform specific code + +To create a new t-engine an author uses a base t-engine (a Spring Boot +application) and implements the following interfaces. An implementation of +the `CustomTransformer` provides the actual transformation code and the +implementation of the `TransformEngine` says what it is capable of +transforming. The `TransformConfig` is normally read from a json file on the +classpath. Multiple `CustomTransformer` implementations may be in a singe +t-engine. As a result the author can concentrate on the code that transforms +one format to another without really worrying about all the plumbing. +Typically, the transform specific code uses a 3rd party library or an +external executable which needs to be added to the Docker image. + +~~~ +package org.alfresco.transform; + +import org.alfresco.transform.config.TransformConfig; +import org.alfresco.transformer.probes.ProbeTestTransform; + +import java.util.Set; + +/** + * Interface to be implemented by transform specific code. Provides information + * about the t-engine as a whole. So that it is automatically picked up, it must + * exist in a package under {@code org.alfresco.transform} and have the Spring + * {@code @Component} annotation. + */ +public interface TransformEngine +{ + /** + * @return the name of the t-engine. The t-router reads config from t-engines + * in name order. + */ + String getTransformEngineName(); + + /** + * @return a definition of what the t-engine supports. Normally read from a json + * Resource on the classpath. + */ + TransformConfig getTransformConfig(); + + /** + * @return a ProbeTestTransform (will do a quick transform) for k8 liveness and + * readiness probes. + */ + ProbeTransform getProbeTransform(); +} +~~~ + +implementations of the following interface provide the actual transform code. + +~~~ +package org.alfresco.transform; + +import java.io.InputStream; +import java.io.OutputStream; +import java.util.Map; + +/** + * Interface to be implemented by transform specific code. The + * {@code transformerName} should match the transformerName in the + * {@link TransformConfig} returned by the {@link TransformEngine}. So that it is + * automatically picked up, it must exist in a package under + * {@code org.alfresco.transform} and have the Spring {@code @Component} annotation. + * + * Implementations may also use the {@link TransformManager} if they wish to + * interact with the base t-engine. + */ +public interface CustomTransformer +{ + String getTransformerName(); + + void transform(String sourceMimetype, InputStream inputStream, + String targetMimetype, OutputStream outputStream, + Map transformOptions, + TransformManager transformManager) throws Exception; +} +~~~ + +The implementation of the following interface is provided by the t-base, +allows the `CustomTransformer` to interact with the base t-engine. The +creation of Files is discouraged as it is better not to leave files on disk. + +~~~ +package org.alfresco.transform.base; + +import java.io.File; +import java.io.InputStream; +import java.io.OutputStream; +import java.util.Map; + +/** + * Allows {@link CustomTransformer} implementations to interact with the base + * t-engine. + */ +public interface TransformManager +{ + /** + * Allows a CustomTransformer to use a local source File rather than the + * supplied InputStream. To avoid creating extra files, if a File has already + * been created by the base t-engine, it is returned. + */ + File createSourceFile(); + + /** + * Allows a CustomTransformer to use a local target File rather than the + * supplied OutputStream. To avoid creating extra files, if a File has already + * been created by the base t-engine, it is returned. + */ + File createTargetFile(); + + /** + * Allows a single transform request to have multiple transform responses. For + * example, images from a video at different time offsets or different pages of + * a document. Following a call to this method a transform response is made with + * the data sent to the current {@code OutputStream}. If this method has been + * called, there will not be another response when {@link CustomTransformer# + * transform(String, InputStream, String, OutputStream, Map, TransformManager)} + * returns and any data written to the final {@code OutputStream} will be + * ignored. + * @param index returned with the response, so that the fragment may be + * distinguished from other responses. Renditions use the index + * as an offset into elements. A {@code null} value indicates + * that there is no more output and any data sent to the current + * {@code outputStream} will be ignored. + * @param finished indicates this is the final fragment. {@code False} indicates + * that it is expected there will be more fragments. There need + * not be a call with this parameter set to {@code true}. + * @return a new {@code OutputStream} for the next fragment. A {@code null} will + * be returned if {@code index} was {@code null} or {@code + * finished} was {@code true}. + * @throws TransformException if a synchronous (http) request has been made as + * this only works with requests on queues, or the first call to + * this method indicated there was no output, or another call is + * made after it has been indicated that there should be no more + * fragments. + * @throws IOException if there was a problem sending the response. + OutputStream respondWithFragment(Integer index); +} +~~~ \ No newline at end of file diff --git a/docs/transformer-selection.md b/docs/transformer-selection.md new file mode 100644 index 00000000..de132447 --- /dev/null +++ b/docs/transformer-selection.md @@ -0,0 +1,27 @@ +## Transformer selection strategy + +The TransformRegistry uses t-config to choose which Transformer will be +used. A transformer definition contains a supported list of source and +target Media Types. This is used for the most basic selection. It is further +refined by checking that the definition also supports transform options (the +parameters) that have been supplied in a transform request. + +~~~ +Transformer 1 defines options: Op1, Op2 +Transformer 2 defines options: Op1, Op2, Op3, Op4 + +Transform request provides values for options: Op2, Op3 +~~~ +If we assume both transformers support the required source and target Media +Types, Transformer 2 will be selected because it knows about all the supplied +options. The definition may also specify that some options are required or +grouped. If any members of an optional group are supplied, all required +members of that group become required. + +The configuration may impose a source file size limit resulting in the +selection of a different transformer. Size limits are normally added to avoid +the transforms consuming too many resources. + +The configuration may also specify a priority which will be used in +Transformer selection if there are a number of possible transformers. The +highest priority is the one with the lowest number. \ No newline at end of file diff --git a/docs/transformerDebug.md b/docs/transformerDebug.md new file mode 100644 index 00000000..7456df34 --- /dev/null +++ b/docs/transformerDebug.md @@ -0,0 +1,46 @@ +## TransformerDebug + +In addition to any normal logging, the t-engines, t-router and t-client also +use the `TransformerDebug` class to provide request based logging. The +following is an example from Alfresco after the upload of a `docx` file. + +~~~ +163 docx json AGM 2016 - Masters report.docx 14.8 KB -- metadataExtract -- TransformService +163 workspace://SpacesStore/0db3a665-328d-4437-85ed-56b753cf19c8 1563306426 +163 docx json 14.8 KB -- metadataExtract -- PoiMetadataExtractor +163 cm:title= +163 cm:author=James Dobinson +163 Finished in 664 ms +... +164 docx png AGM 2016 - Masters report.docx 14.8 KB -- doclib -- TransformService +164 workspace://SpacesStore/0db3a665-328d-4437-85ed-56b753cf19c8 1563306426 +164 docx png 14.8 KB -- doclib -- officeToImageViaPdf +164.1 docx pdf libreoffice +164.2 pdf png pdfToImageViaPng +164.2.1 pdf png pdfrenderer +164.2.2 png png imagemagick +164.2.2 endPage="0" +164.2.2 resizeHeight="100" +164.2.2 thumbnail="true" +164.2.2 startPage="0" +164.2.2 resizeWidth="100" +164.2.2 autoOrient="true" +164.2.2 allowEnlargement="false" +164.2.2 maintainAspectRatio="true" +164 Finished in 725 ms +~~~ + +This log happens to be from the t-client, but similar log lines exist in the +t-router and individual t-engines. + +All lines start with a reference, which starts with the client’s request +number (`163`, `164` if known) and then a nested pipeline or failover +structure. The first request extracts metadata and the second creates a +thumbnail rendition (called `doclib`). The second request is handled by a +pipeline called `officeToImageViaPdf` which uses `libreoffice` to transform +to `pdf` and then another pipeline to convert to `png`. The last step +(`164.2.2`) in the process resizes the `png` using a number of transform +options. + +If requested, log information is passed back in the TransformReply's +clientData. \ No newline at end of file diff --git a/engines/base/src/main/java/org/alfresco/transform/base/registry/TransformConfigFiles.java b/engines/base/src/main/java/org/alfresco/transform/base/registry/TransformConfigFiles.java index fd7e63a1..eb8d3b68 100644 --- a/engines/base/src/main/java/org/alfresco/transform/base/registry/TransformConfigFiles.java +++ b/engines/base/src/main/java/org/alfresco/transform/base/registry/TransformConfigFiles.java @@ -38,8 +38,8 @@ import java.util.Map; @ConfigurationProperties(prefix = "transform.config") public class TransformConfigFiles { - // Populated from Spring Boot properties or such as transform.config.file. or environment variables like - // TRANSFORM_CONFIG_FILE_. + // Populated from Spring Boot properties or such as transform.config.file. or environment variables like + // TRANSFORM_CONFIG_FILE_. private final Map files = new HashMap<>(); public Map getFile()