Added an optional extractMapping transform option to all metadata extractors to override the default one in the T-Engine.
In the case of the AGS AMP it extends the RFC822MetadataExtracter with its own class to specify a different set of document to system mappings. The class in the repo no longer does extractions, but is now used by the AsynchronousExtractor, which offloads extractions to T-Engines to obtain the mappings if it has been extended that are then passed to the T-Engine.
Removed all the Extractors that now exist in the T-Engines:
JodConnverterMetadataExtracter
TikaPoweredMetadataExtracter – the abstract base class used by other extractors
-- MailMetadataExtracter
-- PoiMetadataExtracter
-- TikaAutoMetadataExtracter
-- MP3MetadataExtracter
-- TikaSpringConfiguredMetadataExtracter - removed as it required Spring config and would run in process
-- PdfBoxMetadataExtracter
-- OpenDocumentMetadataExtracter
-- OfficeMetadataExtracter
-- DWGMetadataExtracter
HtmlMetadataExtracter
RFC822MetadataExtracter
XmlMetadataExtracter and XPathMetadataExtracter still exist but don't provide any extraction out of the box. The reason they still exist is to support custom transforms (in AMPs) to extract from XML. There are no XML extractors in the T-Engines at the moment, but that is where the custom transformer code really should be moved.
There are new tests to ensure the async transforms take place as expected.
Additionally many of the existing tests still exist (those not related to a specific extractor). Some of these have been modified to reflect that the extract is now async and to no longer check the modified value has not changed (it is now expected to change).
There are also a number of new metadata extract smoke tests that ensure that a selected subset of extracts are supported by the OOTB T-Engines.
* REPO-5208 Addition of extra async metadata extract tests for overriding policy, tag extractio and carryAspectProperties
Main author: Adina Ababei <adina.ababei@ness.com>
Testing of tagging modified by Alan Davis to pass when Solr is not running.