32 Commits

Author SHA1 Message Date
Alan Davis
f05b54dea3
ACS-2497 Rework httpTransformRequestUsingDirectAccessUrlTest so it can be used in AI and Aspose (#535) 2022-02-18 14:08:18 +00:00
Kacper Magdziarz
11e3cb4b20
ACS-2497 T-Core: Accept DAU requests (#525)
* ACS-2497 Add implementation of Direct Access Url usage for transformation.
Add possibly to pass Direct Access Url to Transform request instead of a sending a file.
2022-02-14 11:30:48 +01:00
Alan Davis
a89e161004
ACS-2498 Switch to using a configVersion parameter on the /transform/config endpoint (#530)
* Fixed the config returned by the AIO as it did not include the coreVersion even though the individual ones did.
2022-02-10 23:50:19 +00:00
Alan Davis
df519cfd6f
ACS-2498 Add coreVersion to T-Engine config (#526)
The bulk of the changes in this PR are to do with adding a coreVersion element to the transform element in the T-Engine config. For more detail see the class header of CoreVersionDecorator.

* Support the use of coreVersion so that it is possible to upgrade pods in any order
* Moved the majority of the RequestParamMap static finals to alfresco-transform-model and added a new one: "includeCoreVersion" parameter.
2022-02-09 22:39:40 +00:00
Alan Davis
59325bc38a
Repeat Bump dependency.tika.version from 2.1.0 to 2.2.1 (#516)
* Repeat Bump dependency.tika.version from 2.1.0 to 2.2.1

Original PR https://github.com/Alfresco/alfresco-transform-core/pull/506 was merged to master where it failed. There had been no build of the PR before the merge, which is why this branch has been created.

* Use non deprecated TikaCoreProperties.SUBJECT with tika 2.2.1.

The deprecated OfficeOpenXMLCore.SUBJECT value worked in 2.2.0 but not 2.2.1

* With the upgrade of Tika from 2.2.0 to 2.2.1, the deprecated OfficeOpenXMLCore.SUBJECT metadata value became being null and the replacement TikaCoreProperties.SUBJECT became a multi value in a few of our test cases. For backward compatibility with very old versions of Alfresco, we have historically been added a number of extra values including "subject" and "description" back into the raw metadata, before mapping them onto Alfresco properties. These values existed in the original version of Tika used by Alfresco, so it is possible there are custom mappings out there that using them.

To complicate matters a little, out standard mappings for some types put the raw "subject" value into cm:description property. What makes it interesting is that the extra "description" value is not used but has the value originally in our expected metadata extarct data. That is why the quick_*_json files have been modified.
2022-01-13 17:25:56 +00:00
Tom Page
6e6c8c12c2
ACS-2382 Remove old license properties files. 2022-01-06 10:32:57 +00:00
alandavis
1cd673de63 Restore ATS-969 Tika upgrade 1.x -> 2.x (#493)
This reverts commit 9776577a452444dad634117d349635604fa9a9a8.

Was not possible to perform the release of 2.5.5-A1 with this upgrade of Tika.
Possibly related to it forcing a change in the following files, which were then deleted in the build:
D	alfresco-transform-core-aio/alfresco-transform-core-aio-boot/src/license/THIRD-PARTY.properties
D	alfresco-transform-core-aio/alfresco-transform-core-aio/src/license/THIRD-PARTY.properties
D	alfresco-transform-tika/alfresco-transform-tika-boot/src/license/THIRD-PARTY.properties
D	alfresco-transform-tika/alfresco-transform-tika/src/license/THIRD-PARTY.properties
2022-01-05 21:56:41 +00:00
Alan Davis
9776577a45
[trigger release] 2.5.5-A1 (#511)
Revert ATS-969 Tika upgrade 1.x -> 2.x (#493)

As the build is deleting the following, resulting in the release job failure
D	alfresco-transform-core-aio/alfresco-transform-core-aio-boot/src/license/THIRD-PARTY.properties
D	alfresco-transform-core-aio/alfresco-transform-core-aio/src/license/THIRD-PARTY.properties
D	alfresco-transform-tika/alfresco-transform-tika-boot/src/license/THIRD-PARTY.properties
D	alfresco-transform-tika/alfresco-transform-tika/src/license/THIRD-PARTY.properties
2022-01-05 20:26:04 +00:00
Alan Davis
a98f937b4a
ACS-2002 Enhance T-Router debug (#507)
[trigger release] 2.5.5-A1

By default T-Engines now provide the more readable TransformerDebug DEBUG messages, rather than the original detailed request and reply messages, which are still available as TRACE.
2022-01-05 12:17:17 +00:00
Piotr Żurek
d7f7520c45
ATS-969 Tika upgrade 1.x -> 2.x (#493) 2021-12-08 10:27:40 +01:00
David Edwards
f831c46672
Refactor Tika Controller (#415)
Allows for object creation to be done on instatiation, instead of first transform call.
2021-06-11 21:07:11 +01:00
Ayman Harake
20d9a1f1b6
ATS-913: Tech debt: add more T-Core Unit tests (part 2) for IPTC metadata extract (#414) 2021-06-02 15:00:03 +01:00
Ayman Harake
49bb9afa5b
ATS-913: Added tests for GIF and PNG and amended properties file (#416). 2021-06-02 11:19:39 +01:00
David Edwards
e11cbd5180
ATS-892 Convert ExifTool separated strings into collections for ACS consumption (#397)
ATS-911 Add regex pattern matching for date replacement
2021-05-06 08:58:42 +01:00
Ayman Harake
d25e3c365a
ATS-907: Adding tests for raw files that are already used by Alfresco (IPTC files work copied from https://github.com/Alfresco/media-management) (#398) 2021-05-04 14:11:46 +01:00
David Edwards
4ab8a1120f
ATS 905 enable IPTCMetadataExtractor in Tika and AIO engine config (#394) 2021-04-29 16:39:03 +01:00
David Edwards
804c745004
ATS-893 Add Exiftool to docker containers & add licences (#384)
* ATS-893 Add Exiftool to docker containers & add licences
2021-04-28 14:44:31 +01:00
David Edwards
03d08d0c9e
MNT-22082 transformation of pdf to text hang (#367)
A new constructor has been added to the TikaController to provide
the new spring config.
The creation of the TikaExecutor has been moved to "singleton pattern" as
the injection of the @Value happens after the instantiation of the
TikaJavaExecutor and does not pass the value correctly. The
instantiation is now done once, on the first transform request.
Param has been added to the AIO beans.
2021-04-13 09:59:42 +01:00
David Edwards
ef21365e00
ACS-930 Security update to spring boot 2.4.1 (#321)
* ACS-930 Upgrade to Junit5
2021-01-15 10:31:25 +00:00
Alan Davis
2fd11d5aed
REPO-5191 Bug: T-Engine should provide mapping rather than the repo. (#316)
Bug found while reviewing documents on how to create a custom metadata extractor. The original refactor had left the repo doing the mapping. It should have been passing the fully qualified repo properties to the T-Engine to do the mapping.

Linked to:
    Alfresco/alfresco-community-repo#227
    Alfresco/acs-packaging#1826
2021-01-06 22:25:40 +00:00
Alan Davis
00fbb6405a
ATS-829 Release T-Engines 2.3.6 (#307)
ATS-829: Release T-Core (T-Engines) 2.3.6 [trigger release]

Linked to REPO-5219 Allow AGS AMP to specify metadata extract mapping

Added an extractMapping transform option to all metadata extractors to override the default one.

3rd party libraries to get a green build.
* Upgrade cxf-rt-transports-http and woodstox-core to avoid issues
* Upgrade to org.springframework.boot:spring-boot-starter-parent:2.3.5.RELEASE to avoid problem in org.springframework:spring-web
* Upgrade to activemq 5.15.13 to avoid problem in activemq-broker 5.15.12
2020-11-19 18:35:22 +00:00
eknizat
0273fd5c07
ATS-816: Fix tika apple keynote (#285)
* ATS-816: Fix tika apple keynote
The application/vnd.apple.keynote -> text/plain transformation has been found to fail after switching the version of tika in ATS-801
The previous version of tika would use the org.apache.tika.parser.pkg.PackageParser but the new version uses an empty parser producing empty target file.

* Re enable test for application/vnd.apple.keynote to text
2020-08-06 12:30:20 +01:00
eknizat
522c793970
ACS-373: Fix tika proble UnsupportedOperationException (#264) 2020-06-23 16:52:49 +01:00
Ayman Harake
14e70b9785
ATS-762: T-Core Legacy Part 2 - Legacy Pipeline additions (was: review failing legacy transforms) (#262)
* ATS-762: Add Tika unit test for pdf to csv

* ATS-762: Fix indentation

* ATS-762: Added 3 tests for simple pipepline. msg > txt, txt > doc, txt > odt, txt > rtf

* ATS-762: Added tests for libreofficeToPdf pipeline

* ATS-762: Addressed Jan's comment about not using asterisk when importing modules

* ATS-762: Changed comment to pdf-->csv to address Jan's comment on the PR

* task/ATS-762_T: noticed the txt mime type was wrong so fixed it

Co-authored-by: kristian <kristian.dimitrov@alfresco.com>
2020-06-19 18:03:56 +01:00
Alan Davis
d495459b9b
ATS-777 / REPO-4334: Move metadata extraction into T-Engines (#256)
* REPO-4334 Move metadata extraction into T-Engines
- new "transformImpl" required (processTransform deprecated)
- JavaDoc

Co-authored-by: Jan Vonka <jan.vonka@alfresco.com>
2020-06-16 14:41:49 +01:00
Alan Davis
06109dee75
REPO-4334 Move metadata extraction into T-Engines (#247)
* Metadata extract code added to T-Engines
* Required a refactor of duplicate code to avoid 3x more duplication:
        - try catches used to return return exit codes
        - calls to java libraries or commands to external processes
        - building of transform options in controllers, adaptors
* integration tests based on current extracts performed in the repo
* included extract code for libreoffice, and embed code even though not used out of the box any more. There may well be custom extracts using them that move to T-Engines
* removal of unused imports
* minor autoOrient / allowEnlargement bug fixes that were not included in Paddington on the T-Engine side.
2020-06-11 20:20:22 +01:00
Ayman Harake
9931bdc678
ATS-763: Update T-Core for Legacy: Add test files & tests for newly added transforms (in ATS-731) (#252)
* ATS-763: Added missing tests in Ticka

* ATS-763: Added the missing transform tests for Libre Office and replaced quick files in Ticka

* ATS-763: Replaced newly added quick.xml and quick.msg with preexisting files.

* ATS-763: Added targets to tests in Libre Office -see Jan's comment in PR

* ATS-763: Added test files to Image Magick, and uncommented the PSD source file

* ATS-763: put back a comment in Image Magick how it was before my previous commit

* ATS-763: Resolved Jan's comment about seperating out mimetypes into their correct section such as SPREADSHEET or PRESENTATION

* ATS-763: Fixed failing test (ppsm and ppsx)

* ATS-763: Removed unnecessary source files in Image Magick

* ATS-763: Fix failing LibreOffice unit tests

* ATS-763: Fix indentation in LibreOfficeTransformationIT

* ATS-763: fixed failing image magick tests and removed failing transform from config

* ATS-763: Added missing priority for pages -> txt

Co-authored-by: kristian <kristian.dimitrov@alfresco.com>
2020-06-09 16:57:23 +01:00
David Edwards
cd16637143
ATS-702 Add AIO tests from Tika (#232)
* ATS-702 - Implement Tika controller tests on AIO

* ATS-702 - Add Tika IT through AIOController
2020-04-22 15:50:20 +01:00
Kristian Dimitrov
a1b6283a4c
ATS-669: Parameterize T-Engines transformer execution locations (#203)
* ATS-669: Implement cmd line arguments for ImageMagick, PdfRenderer and LibreOffice

* ATS-669: Remove unnecessary test ImageMagick line

* ATS-669: Implement Spring boot properties via application.yaml

* ATS-669: Implement Spring config binds and utilize new functionality in pdfRender

* ATS-669: Wire externalProps for ImageMagick

* ATS-669: Wire externalProps for LibreOffice

* ATS-669: Fix failing tests

* ATS-669: Implement parameterized execution for All-In-One transform module

* ATS-669: Use string values instead of GlobalProperties class

* ATS-669: Change pdfrenderer property format

* ATS-669: Add validation to executor constructors

* ATS-669: Fix failing LibreOffice tests

* ATS-669: Add missing license

* ATS-669: Update LibreOffice version

* ATS-669: Remove unnecessary annotation

* ATS-669: Standardise properties

* ATS-669: Change field variable names

* ATS-669: Change field variable values

* ATS-669: Add unit tests for passing system properties

* ATS-669: Standardise yaml properties

* ATS-669: Remove unnecessary super() calls

* ATS-669: Change CRLF to LF

* ATS-669: Change LF to CRLF

* ATS-669: Fix yaml indentation

* ATS-669: Update tika and misc yaml file with new sub-property

* ATS-669: Remove unused import

* ATS-669: Update TransformRegistryImpl property location
2020-04-16 16:32:01 +01:00
montgolfiere
7952c40ee5
ATS-706: Transform AIO - fix license log messages to be consistent on startup (#219)
- see also ATS-711
2020-04-14 19:25:14 +01:00
eknizat
af77d429e7
ATS-675:Add All-In-One transformer (#200)
* ATS-695/ATS-675 Add aio boot project

- Added the bare bones of a spring boot project to be used by aio. Currently based loosely on transform-misc.

* ATS-674/ATS-695 Add forms for each transformer.

* ATS-675/ATS-695 add empty test to pass build during dev

* ATS-695 remove maven profile to fix build

* ATS-675 Define interface and the aio transformer

* Fix formatting and rename the module as per review comments

* ATS-675/ATS-695 Add ProbeTestTransformation

Currenly uses MiscController implementation.

* ATS-675/ATS-695 Add logger method,

This will be code repeated in the local transform method and the processTransform method

* ATS-675/ATS-695 Implement local transform method

Minimum implementation for transform method.

* ATS-675/ATS-695  Implement processTransform

* ATS-675/ATS-695 Rename project to alfresco-transform-core-aio-boot

Add alfresco-transform-core-aio dependencies

* ATS-675/ATS-695 Fix build

Update project location
Update imports and variable declarations in TODOs
Add error handling.
Formatting.

* ATS-693: Update transform-misc Dockerfile with newly reserved uid

* Revert "ATS-691: Combine the win/linux pathToFile logic"

This reverts commit 61fe4820

* ATS-693: Update transform-misc Dockerfile with newly reserved uid

* "ATS-693: Add Dockerfile to aio-boot module"

* ATS-675/ATS-695 Add resource required for ProbeTestTrasform

* ATS-675/ATS-695 Remove test resources, to be added in test implementation

* ATS-693: Fix path to jar resources

* ATS-675/ATS-703 Moved Options builder to non boot jar.

* ATS-675/ATS-703 Rename OptionsBuilder to PdfRendererOptionsBuilder

This is to avoid confilct with OptionsBuilders in other T-engines.

* ATS-675/ATS-695 Added PdfRendererApadpter.java

Added dependency to pom.xml
Required transformation of String to Long, method added to Util.java

* ATS-675/ ATS-704

Implemented LibreOfficeAdapter

* ATS-675 Parity with base aio naming convention

* ATS-675/ATS-705 Implemented ImageMagickAdapter

Moved and renamed OptionsBuilder. Moved to alfresco-transform-imagemagick, renamed ImageMagickOptionsBuilder.
Added dependencies to pom.xml

* ATS-693: Implement maven docker build

* Initial tests
* Add initial tests for config aggregation
* Update AbstractTransformerControllerTest to use the new engine config names

* Fix up controller

* Fix travis tests  (#205)

* Fix engine specific properties for engine config location
* Temporarily add engine configs to test resources for the boot modules.  Will need to fix this properly

* Resolve some review comments

* ATS-675 - Move static strings to util class

* Refactor classes for simpler design (#210)

* ATS-702 Fix error handling

(cherry picked from commit e30cb5fda6ba2ae09c91ef61e69cba4689bcc8d9)

* ATS-675 Rename test class (fixes typo)

* ATS-675: Add aio transformer to static scan
2020-04-08 17:40:34 +01:00
eknizat
3bed6930bf
ATS-671: Split engines into fat & skinny modules (ATS-674) (#192)
Each transform engine project has been separated into 2 modules so that an executable and non-executable jar can be created. 
Modules have been renamed such that *docker* has been removed from the artifactIds and project names.

Co-authored-by: Erik Knizat <erik.knizat@alfresco.com>
Co-authored-by: David Edwards <david.edwards@alfresco.com>
2020-03-27 13:45:15 +00:00