alfresco-transform-core/docs/transformerDebug.md
Alan Davis babe26b0ba
HXENG-64 refactor ATS (#657)
Refactor to clean up packages in the t-model and to introduce a simpler to implement t-engine base.

The new t-engines (tika, imagemagick, libreoffice, pdfrenderer, misc, aio, aspose) and t-router may be used in combination with older components as the API between the content Repo and between components has not changed. As far as possible the same artifacts are created (the -boot projects no longer exist). They may be used with older ACS repo versions.

The main changes to look for are:
* The introduction of TransformEngine and CustomTransformer interfaces to be implemented.
* The removal in t-engines and t-router of the Controller, Application, test template page, Controller tests and application config, as this is all now done by the t-engine base package.
* The t-router now extends the t-engine base, which also reduced the amount of duplicate code.
* The t-engine base provides the test page, which includes drop downs of known transform options. The t-router is able to use pipeline and failover transformers. This was not possible to do previously as the router had no test UI.
* Resources including licenses are automatically included in the all-in-one t-engine, from the individual t-engines. They just need to be added as dependencies in the pom. 
* The ugly code in the all-in-one t-engine and misc t-engine to pick transformers has gone, as they are now just selected by the transformRegistry.
* The way t-engines respond to http or message queue transform requests has been combined (eliminates the similar but different code that existed before).
* The t-engine base now uses InputStream and OutputStream rather than Files by default. As a result it will be simpler to avoid writing content to a temporary location.
* A number of the Tika and Misc CustomTransforms no longer use Files.
* The original t-engine base still exists so customers can continue to create custom t-engines the way they have done previously. the project has just been moved into a folder called deprecated.
* The folder structure has changed. The long "alfresco-transform-..." names have given way to shorter easier to read and type names.
* The t-engine project structure now has a single project rather than two. 
* The previous config values still exist, but there are now a new set for config values for in files with names that don't misleadingly imply they only contain pipeline of routing information. 
* The concept of 'routing' has much less emphasis in class names as the code just uses the transformRegistry. 
* TransformerConfig may now be read as json or yaml. The restrictions about what could be specified in yaml has gone.
* T-engines and t-router may use transform config from files. Previously it was just the t-router.
* The POC code to do with graphs of possible routes has been removed.
* All master branch changes have been merged in.
* The concept of a single transform request which results in multiple responses (e.g. images from a video) has been added to the core processing of requests in the t-engine base.
* Many SonarCloud linter fixes.
2022-09-14 13:40:19 +01:00

2.1 KiB
Raw Permalink Blame History

TransformerDebug

In addition to any normal logging, the t-engines, t-router and t-client also use the TransformerDebug class to provide request based logging. The following is an example from Alfresco after the upload of a docx file.

163               docx json AGM 2016 - Masters report.docx 14.8 KB -- metadataExtract --  TransformService
163               workspace://SpacesStore/0db3a665-328d-4437-85ed-56b753cf19c8 1563306426
163               docx json  14.8 KB -- metadataExtract -- PoiMetadataExtractor
163                 cm:title=
163                 cm:author=James Dobinson
163               Finished in 664 ms
...
164               docx png  AGM 2016 - Masters report.docx 14.8 KB -- doclib --  TransformService
164               workspace://SpacesStore/0db3a665-328d-4437-85ed-56b753cf19c8 1563306426
164               docx png   14.8 KB -- doclib -- officeToImageViaPdf
164.1             docx pdf   libreoffice
164.2             pdf  png   pdfToImageViaPng
164.2.1           pdf  png   pdfrenderer
164.2.2           png  png   imagemagick
164.2.2             endPage="0"
164.2.2             resizeHeight="100"
164.2.2             thumbnail="true"
164.2.2             startPage="0"
164.2.2             resizeWidth="100"
164.2.2             autoOrient="true"
164.2.2             allowEnlargement="false"
164.2.2             maintainAspectRatio="true"
164               Finished in 725 ms

This log happens to be from the t-client, but similar log lines exist in the t-router and individual t-engines.

All lines start with a reference, which starts with the clients request number (163, 164 if known) and then a nested pipeline or failover structure. The first request extracts metadata and the second creates a thumbnail rendition (called doclib). The second request is handled by a pipeline called officeToImageViaPdf which uses libreoffice to transform to pdf and then another pipeline to convert to png. The last step (164.2.2) in the process resizes the png using a number of transform options.

If requested, log information is passed back in the TransformReply's clientData.