* REPO-4639: Split tika engine_config.json into separate transformers. * WIP: REPO-4639 Content conversion failed using Tika The Tika T-Engine "transform" option does not exist when called via the Transform Service or Local transforms, which resulted in no transforms taking place. However this value is really not be needed as the T-Engine should be able to read its own engine_config.xml to work out which sub transform to use. Transforms only worked via Legacy transforms, which used a T-Engine. This code is based on tried and tested ACS repository code. It has been further simplified. TODO: - replace the ConfigFileFinder class just added with something that uses Spring to read the JSON. i.e. simplify it. - replace the CombinedConfig class just added with something that does not need the InLineTransformer. i.e. simplify it. - create tests based on the repo tests - remove the source and target mimetype checks in Tika as a check against engine_config.xml is cleaner. - repeat the process for the Misc T-Engine as it has similar code checking source and target mimetypes. - remove the transform option passed by the legacy transforms. * Removed CombindConfig and ConfigFileFnder classes. * Extracted AbstractTransformRegistry so that it may be used in the ACS repository too. TODO AbstractTransformRegistry and AbstractTransformRegistry need to be moved to the alfresco-transform-model pakage * tidy up only * REPO-4639: Add priority to duplicate transforms. * REPO-4639: Refactor TikaTransformationIT to use the new Tika /transform specifications Changes AbstractTransformerControllerTest as the engine_config is now loaded in TransformRegistryImpl instead of AbstractTransformerController * Rename to TransformServiceRegistry, so we don't have to change the repo code. * Added the baseUrl parameter to the register method and fixed the missed rename in the last commit. * Javadoc change only * Moved common classes (with repo) AbbstractTransformRegistry and TransformServiceRegistry to alfresco-transform-model * Replace (simplify) all the isTransformable calls with a check against the JSON. - Tests now only pass targetEncoding to the 'string' transformer. * Fix failing tests. * Revert port change * REPO-4639 : Add priorities to misc engine_config * REPO-4639 : Add priorities to pdf-renderer and imagemagick engine_config * Remove test that is @Ignored * Pick up alfresco-transformer-model 1.0.2.7-REPO-4639-1 * REPO-4639 : Add priorities to libreoffice engine_config * REPO-4639 : Add priorities to tika engine_config * REPO-4639 : Remove all priorities with value equal to 50 (default) from engine_config * Switch over to using TransformServiceRegistry in org.alfresco.transform.client.registry Reintroduce the noExtensionSourceFilenameTest having removed @Ignore. * New whitesource issue on commons-compress 1.18. Upgrading to 1.19. * Removed the text/javascript -> text/plain test as this is not supported * Modifications as a result of changes to method names in alfresco-transform-model * Pick up alfresco-transform-model 1.0.2.7-ATS545-2 * Remove unused imports
Common code for Docker based ACS transformers
This project contains code that is common between all the ACS transformers that run within their own Docker containers. It performs common actions such as logging, throttling requests and handling the streaming of content to and from the container. It also provides structure and hook points to allow specific transformers to simply check request parameter and perform the transformation using either files or a pair of InputStream and OutputStream.
A transformer project is expected to provide the following files:
src/main/resources/templates/transformForm.html
src/main/java/org/alfresco/transformer/<XXX>Controller.java
src/main/java/org/alfresco/transformer/Application.java
- transformerForm.html - A simple test page using thymeleaf that gathers request parameters so they may be used to test the transformer.
<html xmlns:th="http://www.thymeleaf.org">
<body>
<div>
<h2>Test Transformation</h2>
<form method="POST" enctype="multipart/form-data" action="/transform">
<table>
<tr><td><div style="text-align:right">file *</div></td><td><input type="file" name="file" /></td></tr>
<tr><td><div style="text-align:right">targetFilename *</div></td><td><input type="text" name="targetFilename" value="" /></td></tr>
<tr><td><div style="text-align:right">width</div></td><td><input type="text" name="width" value="" /></td></tr>
<tr><td><div style="text-align:right">height</div></td><td><input type="text" name="height" value="" /></td></tr>
<tr><td><div style="text-align:right">allowPdfEnlargement</div></td><td><input type="checkbox" name="allowPdfEnlargement" value="true" /></td></tr>
<tr><td><div style="text-align:right">maintainPdfAspectRatio</div></td><td><input type="checkbox" name="maintainPdfAspectRatio" value="true" /></td></tr>
<tr><td><div style="text-align:right">page</div></td><td><input type="text" name="page" value="" /></td></tr>
<tr><td><div style="text-align:right">timeout</div></td><td><input type="text" name="timeout" value="" /></td></tr>
<tr><td></td><td><input type="submit" value="Transform" /></td></tr>
</table>
</form>
</div>
<div>
<a href="/log">Log entries</a>
</div>
</body>
</html>
- TransformerNameController.java - A Spring Boot Controller that extends AbstractTransformerController to handel a POST request to "/transform".
...
@Controller
public class AlfrescoPdfRendererController extends AbstractTransformerController
{
...
@PostMapping("/transform")
public ResponseEntity<Resource> transform(HttpServletRequest request,
@RequestParam("file") MultipartFile sourceMultipartFile,
@RequestParam("targetFilename") String targetFilename,
@RequestParam(value = "width", required = false) Integer width,
@RequestParam(value = "height", required = false) Integer height,
@RequestParam(value = "allowPdfEnlargement", required = false) Boolean allowPdfEnlargement,
@RequestParam(value = "maintainPdfAspectRatio", required = false) Boolean maintainPdfAspectRatio,
@RequestParam(value = "page", required = false) Integer page,
@RequestParam(value = "timeout", required = false) Long timeout)
{
try
{
File sourceFile = createSourceFile(request, sourceMultipartFile);
File targetFile = createTargetFile(request, targetFilename);
// Both files are deleted by TransformInterceptor.afterCompletion
StringJoiner args = new StringJoiner(" ");
if (width != null)
{
args.add("--width=" + width);
}
if (height != null)
{
args.add("--height=" + height);
}
if (allowPdfEnlargement != null && allowPdfEnlargement)
{
args.add("--allow-enlargement");
}
if (maintainPdfAspectRatio != null && maintainPdfAspectRatio)
{
args.add("--maintain-aspect-ratio");
}
if (page != null)
{
args.add("--page=" + page);
}
String options = args.toString();
LogEntry.setOptions(options);
Map<String, String> properties = new HashMap<>();
properties.put("options", options);
properties.put("source", sourceFile.getAbsolutePath());
properties.put("target", targetFile.getAbsolutePath());
executeTransformCommand(properties, targetFile, timeout);
return createAttachment(targetFilename, targetFile);
}
catch (UnsupportedEncodingException e)
{
throw new TransformException(500, "Filename encoding error", e);
}
}
}
- TransformerNameController#processTransform(File sourceFile, File targetFile, Map<String, String> transformOptions, Long timeout)
/transform (Consumes: application/json
, Produces: application/json
)
The new consumes and produces arguments have been specified in order to differentiate this endpoint from the previous one (which consumes multipart/form-data
)
The endpoint should always receive a TransformationRequest
and should always respond with a TransformationReply
.
As specific transformers require specific arguments (e.g. transform
for the Tika transformer) the request body should include this in the transformRequestOptions
via the Map<String,String> transformRequestOptions
.
Example request body
var transformRequest = {
"requestId": "1",
"sourceReference": "2f9ed237-c734-4366-8c8b-6001819169a4",
"sourceMediaType": "pdf",
"sourceSize": 123456,
"sourceExtension": "pdf",
"targetMediaType": "txt",
"targetExtension": "txt",
"clientType": "ACS",
"clientData": "Yo No Soy Marinero, Soy Capitan, Soy Capitan!",
"schema": 1,
"transformRequestOptions": {
"targetMimetype": "text/plain",
"targetEncoding": "UTF-8",
"transform": "PdfBox"
}
}
Example response body
var transformReply = {
"requestId": "1",
"status": 201,
"errorDetails": null,
"sourceReference": "2f9ed237-c734-4366-8c8b-6001819169a4",
"targetReference": "34d69ff0-7eaa-4741-8a9f-e1915e6995bf",
"clientType": "ACS",
"clientData": "Yo No Soy Marinero, Soy Capitan, Soy Capitan!",
"schema": 1
}
processTransform method
public abstract class AbstractTransformerController
{
void processTransform(File sourceFile, File targetFile, Map<String, String> transformOptions, Long timeout) { /* Perform the transformation*/ }
}
The abstract method is declared in the AbstractTransformerController and must be implemented by the specific controllers.
This method is called by the AbstractTransformerController directly in the new /transform
endpoint which consumes application/json
and produces application/json
.
The method is responsible for performing the transformation. Upon a successful transformation it updates the targetFile
parameter.
- Application.java - Spring Boot expects to find an Application in a project's source files. The following may be used:
package org.alfresco.transformer;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
@SpringBootApplication
public class Application
{
public static void main(String[] args)
{
SpringApplication.run(Application.class, args);
}
}
Building and testing
The project can be built by running the Maven command:
mvn clean install
Artifacts
The artifacts can be obtained by:
- downloading from the Alfresco repository
- Adding a Maven dependency to your pom file.
<dependency>
<groupId>org.alfresco</groupId>
<artifactId>alfresco-transformer-base</artifactId>
<version>1.0</version>
</dependency>
and the Alfresco Maven repository:
<repository>
<id>alfresco-maven-repo</id>
<url>https://artifacts.alfresco.com/nexus/content/groups/public</url>
</repository>
The build plan is available in TravisCI.
Contributing guide
Please use this guide to make a contribution to the project.