Merged HEAD-BUG-FIX (5.0/Cloud) to HEAD (5.0/Cloud)

84058: Merged V4.2-BUG-FIX (4.2.4) to HEAD-BUG-FIX (5.0/Cloud)
      83799: MNT-12238: Merged DEV 4.2-BUG-FIX (4.2.4) to V4.2-BUG-FIX (4.2.4)
         MNT-12238: Merged 4.1-BUG-FIX (4.1.10) to V4.2-BUG-FIX (4.2.4)
            80291: Merged V4.1.6 (4.1.6.21) to V4.1-BUG-FIX (4.1.10)
               77378: Merged DEV PATCHES/V4.1.6 (19) to PATCHES/V4.1.6 (20)
                  76649: MNT-11823: Upload of PPTX causes very high memory usage leading to system instability
                     - Patch from MNT-577 has been combined with new changes to avoid hanging of analyzing complicated PPTX documents. The fix just disables reading the entire contents of the complicated document. POI metadata extractor may be switched to standard behavior or reconfigured, using the following new properties: content.transformer.Poi.poiFootnotesLimit, content.transformer.Poi.poiExtractPropertiesOnly and content-services-context.xml/extracter.Poi/poiAllowableXslfRelationshipTypes
                  77379: MNT-11823: Upload of PPTX causes very high memory usage leading to system instability
                     Test and the test data for MNT-577 have been added. Test for MNT-11823 has also been added. But this test is commented because the test data (appropriate PPTX document) is not currently available. Getters for POI specific properties have been added to 'PoiMetadataExtracter' for tests. Also 'afterPropertiesSet()' logic has been a bit modified to allow setting 'false' value for 'poiExtractPropertiesOnly' parameter
                  77561: MNT-11823: Upload of PPTX causes very high memory usage leading to system instability
                     Fix for https://bamboo.alfresco.com/bamboo/browse/HF-PATCH416-126 build failure. POI extractor and transformer properties of 'AlfrescoPoiPatchUtils' have been isolated from each other using context. Each extractor or transformer now has its own context or uses the default context. Properties of the default context allow parsing the entire contents of XLSF documents. And footnotes limit is 50. Property names have not been changed, but currently 'content-services-context.xml/extracter.Poi/poiAllowableXslfRelationshipTypes=null' does not lead to 'content.transformer.Poi.poiExtractPropertiesOnly=false'. I. e., this list may be empty. 'PoiMetadataExtracterTest' test has been modified in accordance with the introduced changes. 'poi-OOXML-3.9-beta1-20121109.jar' has been renamed to 'poi-OOXML-3.9-beta1-20121109-patched.jar'
                  79180: MNT-12043: CLONE - Upload of PPTX causes very high memory usage leading to system instability
                     Timeout mechanism has been added to content transformers. Timeout configuration options have been added. Also mechanism to close streams after 'TimoutException' has been added to transformers and metadata extractors. Also timeout mechanism for input streams has been enabled in 'TikaPoweredContentTransformer'
                  79268: MNT-12043: CLONE - Upload of PPTX causes very high memory usage leading to system instability
                     Fix for the https://bamboo.alfresco.com/bamboo/browse/HF-PATCH416-133 build failure and comments of the review https://fisheye.alfresco.com/cru/CR-100#CFR-1184. The new test has been added into 'PoiOOXMLContentTransformerTest.testMnt12043()' to check out the newly added timeout mechanism
                  79290: MNT-12043: CLONE - Upload of PPTX causes very high memory usage leading to system instability
                     - Removed methods and properties that are no longer needed
                  79327: MNT-12043: CLONE - Upload of PPTX causes very high memory usage leading to system instability
                     - Increased ADDITIONAL_PROCESSING_TIME to 1500ms to try and avoid a new intermittent test failure.
      83885: MNT-12238 Bring Maven POM file in sync with latest patched version of poi-ooxml


git-svn-id: https://svn.alfresco.com/repos/alfresco-enterprise/alfresco/HEAD/root@84627 c4b6b30b-aa2e-2d43-bbcb-ca4b014f7261
This commit is contained in:
Alan Davis
2014-09-18 17:23:49 +00:00
parent fa4d74fa5b
commit 862e07f3e2
16 changed files with 1148 additions and 43 deletions

View File

@@ -319,7 +319,17 @@
<bean id="extracter.PDFBox" class="org.alfresco.repo.content.metadata.PdfBoxMetadataExtracter" parent="baseMetadataExtracter">
<property name="documentSelector" ref="pdfBoxEmbededDocumentSelector" />
</bean>
<bean id="extracter.Poi" class="org.alfresco.repo.content.metadata.PoiMetadataExtracter" parent="baseMetadataExtracter" />
<bean id="extracter.Poi" class="org.alfresco.repo.content.metadata.PoiMetadataExtracter" parent="baseMetadataExtracter">
<property name="poiFootnotesLimit" value="${content.transformer.Poi.poiFootnotesLimit}" />
<property name="poiExtractPropertiesOnly" value="${content.transformer.Poi.poiExtractPropertiesOnly}" />
<property name="poiAllowableXslfRelationshipTypes">
<list>
<!-- These values are valid for Office 2007, 2010 and 2013 -->
<value>http://schemas.openxmlformats.org/officeDocument/2006/relationships/presProps</value>
<value>http://schemas.openxmlformats.org/officeDocument/2006/relationships/viewProps</value>
</list>
</property>
</bean>
<bean id="extracter.Office" class="org.alfresco.repo.content.metadata.OfficeMetadataExtracter" parent="baseMetadataExtracter" />
<bean id="extracter.Mail" class="org.alfresco.repo.content.metadata.MailMetadataExtracter" parent="baseMetadataExtracter" />
<bean id="extracter.Html" class="org.alfresco.repo.content.metadata.HtmlMetadataExtracter" parent="baseMetadataExtracter" />

View File

@@ -670,6 +670,12 @@ system.thumbnail.quietPeriod=604800
system.thumbnail.quietPeriodRetriesEnabled=true
system.thumbnail.redeployStaticDefsOnStartup=true
# MNT-11823: Limit for read-only footnotes list size and an indication to
# extract only the properties from XSLF documents without reading the
# entire contents of the document
content.transformer.Poi.poiFootnotesLimit=50
content.transformer.Poi.poiExtractPropertiesOnly=true
# The default timeout for metadata mapping extracters
content.metadataExtracter.default.timeoutMs=20000

View File

@@ -504,7 +504,7 @@
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi-ooxml</artifactId>
<version>${dependency.poi.version}</version>
<version>3.10-FINAL-20140910-alfresco-patched</version>
</dependency>
<dependency>
<groupId>org.apache.poi</groupId>

View File

@@ -0,0 +1,62 @@
/*
* Copyright (C) 2005-2014 Alfresco Software Limited.
*
* This file is part of Alfresco
*
* Alfresco is free software: you can redistribute it and/or modify
* it under the terms of the GNU Lesser General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Alfresco is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU Lesser General Public License for more details.
*
* You should have received a copy of the GNU Lesser General Public License
* along with Alfresco. If not, see <http://www.gnu.org/licenses/>.
*/
package org.alfresco.repo.content;
import java.io.Closeable;
import java.io.IOException;
/**
* Base class for stream aware proxies
*
* @author Dmitry Velichkevich
*/
public abstract class AbstractStreamAwareProxy
{
/**
* @return {@link Closeable} instance which represents channel or stream which uses channel
*/
protected abstract Closeable getStream();
/**
* @return {@link Boolean} value which determines whether stream can (<code>true</code>) or cannot ((<code>false</code>)) be closed
*/
protected abstract boolean canBeClosed();
/**
* Encapsulates the logic of releasing the captured stream or channel. It is expected that each resource object shares the same channel
*/
public void release()
{
Closeable stream = getStream();
if ((null == stream) || !canBeClosed())
{
return;
}
try
{
stream.close();
}
catch (IOException e)
{
throw new RuntimeException("Failed to close stream!", e);
}
}
}

View File

@@ -0,0 +1,216 @@
/*
* Copyright (C) 2005-2014 Alfresco Software Limited.
*
* This file is part of Alfresco
*
* Alfresco is free software: you can redistribute it and/or modify
* it under the terms of the GNU Lesser General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Alfresco is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU Lesser General Public License for more details.
*
* You should have received a copy of the GNU Lesser General Public License
* along with Alfresco. If not, see <http://www.gnu.org/licenses/>.
*/
package org.alfresco.repo.content;
import java.io.Closeable;
import java.io.File;
import java.io.InputStream;
import java.io.OutputStream;
import java.nio.channels.FileChannel;
import java.nio.channels.ReadableByteChannel;
import java.util.Locale;
import org.alfresco.service.cmr.repository.ContentData;
import org.alfresco.service.cmr.repository.ContentIOException;
import org.alfresco.service.cmr.repository.ContentReader;
import org.alfresco.service.cmr.repository.ContentStreamListener;
/**
* Proxy for {@link ContentReader} which captures {@link InputStream} or {@link ReadableByteChannel} to introduce a possibility releasing captured resource
*
* @author Dmitry Velichkevich
* @see ContentReader
* @see AbstractStreamAwareProxy
*/
public class StreamAwareContentReaderProxy extends AbstractStreamAwareProxy implements ContentReader
{
private ContentReader delegatee;
private Closeable releaseableResource;
public StreamAwareContentReaderProxy(ContentReader delegator)
{
this.delegatee = delegator;
}
@Override
public boolean exists()
{
return delegatee.exists();
}
@Override
public void getContent(OutputStream os) throws ContentIOException
{
delegatee.getContent(os);
}
@Override
public void getContent(File file) throws ContentIOException
{
delegatee.getContent(file);
}
@Override
public InputStream getContentInputStream() throws ContentIOException
{
InputStream result = delegatee.getContentInputStream();
if (null == releaseableResource)
{
releaseableResource = result;
}
return result;
}
@Override
public String getContentString() throws ContentIOException
{
return delegatee.getContentString();
}
@Override
public String getContentString(int length) throws ContentIOException
{
return delegatee.getContentString(length);
}
@Override
public FileChannel getFileChannel() throws ContentIOException
{
FileChannel result = delegatee.getFileChannel();
if (null == releaseableResource)
{
releaseableResource = result;
}
return result;
}
@Override
public long getLastModified()
{
return delegatee.getLastModified();
}
@Override
public ReadableByteChannel getReadableChannel() throws ContentIOException
{
ReadableByteChannel result = delegatee.getReadableChannel();
if (null == releaseableResource)
{
releaseableResource = result;
}
return result;
}
@Override
public ContentReader getReader() throws ContentIOException
{
return delegatee.getReader();
}
@Override
public boolean isClosed()
{
return delegatee.isClosed();
}
@Override
public void addListener(ContentStreamListener listener)
{
delegatee.addListener(listener);
}
@Override
public ContentData getContentData()
{
return delegatee.getContentData();
}
@Override
public String getContentUrl()
{
return delegatee.getContentUrl();
}
@Override
public String getEncoding()
{
return delegatee.getEncoding();
}
@Override
public Locale getLocale()
{
return delegatee.getLocale();
}
@Override
public String getMimetype()
{
return delegatee.getMimetype();
}
@Override
public long getSize()
{
return delegatee.getSize();
}
@Override
public boolean isChannelOpen()
{
return delegatee.isChannelOpen();
}
@Override
public void setEncoding(String encoding)
{
delegatee.setEncoding(encoding);
}
@Override
public void setLocale(Locale locale)
{
delegatee.setLocale(locale);
}
@Override
public void setMimetype(String mimetype)
{
delegatee.setMimetype(mimetype);
}
@Override
public boolean canBeClosed()
{
return delegatee.isChannelOpen();
}
@Override
public Closeable getStream()
{
return releaseableResource;
}
}

View File

@@ -0,0 +1,217 @@
/*
* Copyright (C) 2005-2014 Alfresco Software Limited.
*
* This file is part of Alfresco
*
* Alfresco is free software: you can redistribute it and/or modify
* it under the terms of the GNU Lesser General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Alfresco is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU Lesser General Public License for more details.
*
* You should have received a copy of the GNU Lesser General Public License
* along with Alfresco. If not, see <http://www.gnu.org/licenses/>.
*/
package org.alfresco.repo.content;
import java.io.Closeable;
import java.io.File;
import java.io.InputStream;
import java.io.OutputStream;
import java.nio.channels.FileChannel;
import java.nio.channels.WritableByteChannel;
import java.util.Locale;
import org.alfresco.service.cmr.repository.ContentData;
import org.alfresco.service.cmr.repository.ContentIOException;
import org.alfresco.service.cmr.repository.ContentReader;
import org.alfresco.service.cmr.repository.ContentStreamListener;
import org.alfresco.service.cmr.repository.ContentWriter;
/**
* Proxy for {@link ContentWriter} which captures {@link OutputStream} or {@link WritableByteChannel} to introduce a possibility of releasing captured resource
*
* @author Dmitry Velichkevich
* @see ContentWriter
* @see AbstractStreamAwareProxy
*/
public class StreamAwareContentWriterProxy extends AbstractStreamAwareProxy implements ContentWriter
{
private ContentWriter delegatee;
private Closeable releaseableResource;
public StreamAwareContentWriterProxy(ContentWriter delegator)
{
this.delegatee = delegator;
}
@Override
public OutputStream getContentOutputStream() throws ContentIOException
{
OutputStream result = delegatee.getContentOutputStream();
if (null == releaseableResource)
{
releaseableResource = result;
}
return result;
}
@Override
public FileChannel getFileChannel(boolean truncate) throws ContentIOException
{
FileChannel result = delegatee.getFileChannel(truncate);
if (null == releaseableResource)
{
releaseableResource = result;
}
return result;
}
@Override
public ContentReader getReader() throws ContentIOException
{
return delegatee.getReader();
}
@Override
public WritableByteChannel getWritableChannel() throws ContentIOException
{
WritableByteChannel result = delegatee.getWritableChannel();
if (null == releaseableResource)
{
releaseableResource = result;
}
return result;
}
@Override
public void guessEncoding()
{
delegatee.guessEncoding();
}
@Override
public void guessMimetype(String filename)
{
delegatee.guessMimetype(filename);
}
@Override
public boolean isClosed()
{
return delegatee.isClosed();
}
@Override
public void putContent(ContentReader reader) throws ContentIOException
{
delegatee.putContent(reader);
}
@Override
public void putContent(InputStream is) throws ContentIOException
{
delegatee.putContent(is);
}
@Override
public void putContent(File file) throws ContentIOException
{
delegatee.putContent(file);
}
@Override
public void putContent(String content) throws ContentIOException
{
delegatee.putContent(content);
}
@Override
public void addListener(ContentStreamListener listener)
{
delegatee.addListener(listener);
}
@Override
public ContentData getContentData()
{
return delegatee.getContentData();
}
@Override
public String getContentUrl()
{
return delegatee.getContentUrl();
}
@Override
public String getEncoding()
{
return delegatee.getEncoding();
}
@Override
public Locale getLocale()
{
return delegatee.getLocale();
}
@Override
public String getMimetype()
{
return delegatee.getMimetype();
}
@Override
public long getSize()
{
return delegatee.getSize();
}
@Override
public boolean isChannelOpen()
{
return delegatee.isChannelOpen();
}
@Override
public void setEncoding(String encoding)
{
delegatee.setEncoding(encoding);
}
@Override
public void setLocale(Locale locale)
{
delegatee.setLocale(locale);
}
@Override
public void setMimetype(String mimetype)
{
delegatee.setMimetype(mimetype);
}
@Override
public boolean canBeClosed()
{
return delegatee.isChannelOpen();
}
@Override
public Closeable getStream()
{
return releaseableResource;
}
}

View File

@@ -1,5 +1,5 @@
/*
* Copyright (C) 2005-2013 Alfresco Software Limited.
* Copyright (C) 2005-2014 Alfresco Software Limited.
*
* This file is part of Alfresco
*
@@ -44,6 +44,7 @@ import java.util.concurrent.TimeoutException;
import org.alfresco.error.AlfrescoRuntimeException;
import org.alfresco.model.ContentModel;
import org.alfresco.repo.content.StreamAwareContentReaderProxy;
import org.alfresco.service.cmr.dictionary.DataTypeDefinition;
import org.alfresco.service.cmr.dictionary.DictionaryService;
import org.alfresco.service.cmr.dictionary.PropertyDefinition;
@@ -2051,18 +2052,20 @@ abstract public class AbstractMappingMetadataExtracter implements MetadataExtrac
return extractRaw(reader);
}
FutureTask<Map<String, Serializable>> task = null;
StreamAwareContentReaderProxy proxiedReader = null;
try
{
task = new FutureTask<Map<String,Serializable>>(new ExtractRawCallable(reader));
proxiedReader = new StreamAwareContentReaderProxy(reader);
task = new FutureTask<Map<String,Serializable>>(new ExtractRawCallable(proxiedReader));
getExecutorService().execute(task);
return task.get(limits.getTimeoutMs(), TimeUnit.MILLISECONDS);
}
catch (TimeoutException e)
{
task.cancel(true);
if (reader.isChannelOpen())
if (null != proxiedReader)
{
reader.getReadableChannel().close();
proxiedReader.release();
}
throw e;
}

View File

@@ -1,5 +1,5 @@
/*
* Copyright (C) 2005-2010 Alfresco Software Limited.
* Copyright (C) 2005-2014 Alfresco Software Limited.
*
* This file is part of Alfresco
*
@@ -19,12 +19,15 @@
package org.alfresco.repo.content.metadata;
import java.util.ArrayList;
import java.util.Set;
import org.alfresco.repo.content.MimetypeMap;
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.apache.poi.patch.AlfrescoPoiPatchUtils;
import org.apache.tika.parser.Parser;
import org.apache.tika.parser.microsoft.ooxml.OOXMLParser;
import org.springframework.beans.factory.InitializingBean;
/**
* POI-based metadata extractor for Office 07 documents.
@@ -37,12 +40,19 @@ import org.apache.tika.parser.microsoft.ooxml.OOXMLParser;
* <b>Any custom property:</b> -- [not mapped]
* </pre>
*
* Uses Apache Tika
* Uses Apache Tika<br />
* <br />
* Configures {@link AlfrescoPoiPatchUtils} to resolve the following issues:
* <ul>
* <li><a href="https://issues.alfresco.com/jira/browse/MNT-577">MNT-577</a></li>
* <li><a href="https://issues.alfresco.com/jira/browse/MNT-11823">MNT-11823</a></li>
* </ul>
*
* @author Nick Burch
* @author Neil McErlean
* @author Dmitry Velichkevich
*/
public class PoiMetadataExtracter extends TikaPoweredMetadataExtracter
public class PoiMetadataExtracter extends TikaPoweredMetadataExtracter implements InitializingBean
{
protected static Log logger = LogFactory.getLog(PoiMetadataExtracter.class);
@@ -53,9 +63,15 @@ public class PoiMetadataExtracter extends TikaPoweredMetadataExtracter
new OOXMLParser()
);
private Integer poiFootnotesLimit;
private Boolean poiExtractPropertiesOnly = false;
private Set<String> poiAllowableXslfRelationshipTypes;
public PoiMetadataExtracter()
{
super(SUPPORTED_MIMETYPES);
super(PoiMetadataExtracter.class.getName(), SUPPORTED_MIMETYPES);
}
@Override
@@ -63,4 +79,73 @@ public class PoiMetadataExtracter extends TikaPoweredMetadataExtracter
{
return new OOXMLParser();
}
/**
* MNT-577: Alfresco is running 100% CPU for over 10 minutes while extracting metadata for Word office document <br />
* <br />
*
* @param poiFootnotesLimit - {@link Integer} value which specifies limit of amount of footnotes of XWPF documents
*/
public void setPoiFootnotesLimit(Integer poiFootnotesLimit)
{
this.poiFootnotesLimit = poiFootnotesLimit;
}
/**
* MNT-11823: Upload of PPTX causes very high memory usage leading to system instability<br />
* <br />
*
* @param poiExtractPropertiesOnly - {@link Boolean} value which indicates that POI extractor must avoid building of the full document parts hierarchy and reading content of
* the parts
*/
public void setPoiExtractPropertiesOnly(Boolean poiExtractPropertiesOnly)
{
this.poiExtractPropertiesOnly = poiExtractPropertiesOnly;
}
public Boolean isPoiExtractPropertiesOnly()
{
return (poiExtractPropertiesOnly == null) ? (false) : (poiExtractPropertiesOnly);
}
/**
* MNT-11823: Upload of PPTX causes very high memory usage leading to system instability<br />
* <br />
*
* @param poiAllowableXslfRelationshipTypes - {@link Set}&lt;{@link String}&gt; instance which determines the list of allowable relationship types for traversing during
* analyzing of XSLF document
*/
public void setPoiAllowableXslfRelationshipTypes(Set<String> poiAllowableXslfRelationshipTypes)
{
this.poiAllowableXslfRelationshipTypes = poiAllowableXslfRelationshipTypes;
}
public Set<String> getPoiAllowableXslfRelationshipTypes()
{
return poiAllowableXslfRelationshipTypes;
}
/**
* MNT-11823: Upload of PPTX causes very high memory usage leading to system instability<br />
* <br />
* Initialization of {@link AlfrescoPoiPatchUtils} properties for {@link PoiMetadataExtracter#getExtractorContext()} context
*/
@Override
public void afterPropertiesSet() throws Exception
{
if (null == poiExtractPropertiesOnly)
{
poiExtractPropertiesOnly = false;
}
String context = getExtractorContext();
if (null != poiFootnotesLimit)
{
AlfrescoPoiPatchUtils.setPoiFootnotesLimit(context, poiFootnotesLimit);
}
AlfrescoPoiPatchUtils.setPoiExtractPropertiesOnly(context, poiExtractPropertiesOnly);
AlfrescoPoiPatchUtils.setPoiAllowableXslfRelationshipTypes(context, poiAllowableXslfRelationshipTypes);
}
}

View File

@@ -1,5 +1,5 @@
/*
* Copyright (C) 2005-2010 Alfresco Software Limited.
* Copyright (C) 2005-2014 Alfresco Software Limited.
*
* This file is part of Alfresco
*
@@ -38,6 +38,7 @@ import org.alfresco.service.cmr.repository.datatype.DefaultTypeConverter;
import org.alfresco.service.cmr.repository.datatype.TypeConversionException;
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.apache.poi.patch.AlfrescoPoiPatchUtils;
import org.apache.tika.embedder.Embedder;
import org.apache.tika.extractor.DocumentSelector;
import org.apache.tika.io.TemporaryResources;
@@ -97,6 +98,8 @@ public abstract class TikaPoweredMetadataExtracter
private DateTimeFormatter tikaDateFormater;
protected DocumentSelector documentSelector;
private String extractorContext = null;
/**
* Builds up a list of supported mime types by merging
* an explicit list with any that Tika also claims to support
@@ -128,22 +131,37 @@ public abstract class TikaPoweredMetadataExtracter
return types;
}
public TikaPoweredMetadataExtracter(String extractorContext, ArrayList<String> supportedMimeTypes)
{
this(extractorContext, new HashSet<String>(supportedMimeTypes), null);
}
public TikaPoweredMetadataExtracter(ArrayList<String> supportedMimeTypes)
{
this(new HashSet<String>(supportedMimeTypes), null);
this(null, new HashSet<String>(supportedMimeTypes), null);
}
public TikaPoweredMetadataExtracter(ArrayList<String> supportedMimeTypes, ArrayList<String> supportedEmbedMimeTypes)
{
this(new HashSet<String>(supportedMimeTypes), new HashSet<String>(supportedEmbedMimeTypes));
this(null, new HashSet<String>(supportedMimeTypes), new HashSet<String>(supportedEmbedMimeTypes));
}
public TikaPoweredMetadataExtracter(HashSet<String> supportedMimeTypes)
{
this(supportedMimeTypes, null);
this(null, supportedMimeTypes, null);
}
public TikaPoweredMetadataExtracter(HashSet<String> supportedMimeTypes, HashSet<String> supportedEmbedMimeTypes)
{
this(null, supportedMimeTypes, supportedEmbedMimeTypes);
}
public TikaPoweredMetadataExtracter(String extractorContext, HashSet<String> supportedMimeTypes, HashSet<String> supportedEmbedMimeTypes)
{
super(supportedMimeTypes, supportedEmbedMimeTypes);
this.extractorContext = extractorContext;
// TODO Once TIKA-451 is fixed this list will get nicer
DateTimeParser[] parsersUTC = {
DateTimeFormat.forPattern("yyyy-MM-dd'T'HH:mm:ss'Z'").getParser(),
@@ -161,6 +179,16 @@ public abstract class TikaPoweredMetadataExtracter
this.tikaDateFormater = new DateTimeFormatterBuilder().append(null, parsers).toFormatter();
}
/**
* Gets context for the current implementation
*
* @return {@link String} value which determines current context
*/
protected String getExtractorContext()
{
return extractorContext;
}
/**
* Version which also tries the ISO-8601 formats (in order..),
* and similar formats, which Tika makes use of
@@ -316,6 +344,9 @@ public abstract class TikaPoweredMetadataExtracter
Map<String, Serializable> rawProperties = newRawMap();
InputStream is = null;
// Parse using properties of the context of current implementation
boolean contextPresented = null != extractorContext;
try
{
is = getInputStream(reader);
@@ -340,6 +371,12 @@ public abstract class TikaPoweredMetadataExtracter
handler = new NullContentHandler();
}
// Set POI properties context if available...
if (contextPresented)
{
AlfrescoPoiPatchUtils.setContext(extractorContext);
}
parser.parse(is, handler, metadata, context);
// First up, copy all the Tika metadata over
@@ -399,6 +436,12 @@ public abstract class TikaPoweredMetadataExtracter
}
finally
{
// Reset POI properties context
if (contextPresented)
{
AlfrescoPoiPatchUtils.setContext(null);
}
if (is != null)
{
try { is.close(); } catch (IOException e) {}

View File

@@ -1,5 +1,5 @@
/*
* Copyright (C) 2005-2013 Alfresco Software Limited.
* Copyright (C) 2005-2014 Alfresco Software Limited.
*
* This file is part of Alfresco
*
@@ -19,12 +19,24 @@
package org.alfresco.repo.content.transform;
import java.util.Map;
import java.util.concurrent.Callable;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.TimeoutException;
import org.alfresco.error.AlfrescoRuntimeException;
import org.alfresco.repo.content.AbstractStreamAwareProxy;
import org.alfresco.repo.content.StreamAwareContentReaderProxy;
import org.alfresco.repo.content.StreamAwareContentWriterProxy;
import org.alfresco.repo.content.metadata.AbstractMappingMetadataExtracter;
import org.alfresco.service.cmr.repository.ContentIOException;
import org.alfresco.service.cmr.repository.ContentReader;
import org.alfresco.service.cmr.repository.ContentServiceTransientException;
import org.alfresco.service.cmr.repository.ContentWriter;
import org.alfresco.service.cmr.repository.TransformationOptionLimits;
import org.alfresco.service.cmr.repository.TransformationOptions;
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
@@ -43,10 +55,26 @@ public abstract class AbstractContentTransformer2 extends AbstractContentTransfo
{
private static final Log logger = LogFactory.getLog(AbstractContentTransformer2.class);
private ExecutorService executorService;
private ContentTransformerRegistry registry;
private boolean registerTransformer;
private boolean retryTransformOnDifferentMimeType;
/**
* A flag that indicates that the transformer should be started in it own Thread so
* that it may be interrupted rather than using the timeout in the Reader.
* Need only be set for transformers that read their source data quickly but then
* take a long time to process the data (such as {@link PoiOOXMLContentTransformer}.
*/
private Boolean useTimeoutThread = false;
/**
* Extra time added the timeout when using a Thread for the transformation so that
* a timeout from the Reader has a chance to happen first.
*/
private long additionalThreadTimout = 2000;
private static ThreadLocal<Integer> depth = new ThreadLocal<Integer>()
{
@Override
@@ -209,7 +237,48 @@ public abstract class AbstractContentTransformer2 extends AbstractContentTransfo
setReaderLimits(reader, writer, options);
// Transform
// MNT-12238: CLONE - CLONE - Upload of PPTX causes very high memory usage leading to system instability
// Limiting transformation up to configured amount of milliseconds to avoid very high RAM consumption
// and OOM during transforming problematic documents
TransformationOptionLimits limits = getLimits(reader.getMimetype(), writer.getMimetype(), options);
long timeoutMs = limits.getTimeoutMs();
if (!useTimeoutThread || (null == limits) || (-1 == timeoutMs))
{
transformInternal(reader, writer, options);
}
else
{
Future<?> submittedTask = null;
StreamAwareContentReaderProxy proxiedReader = new StreamAwareContentReaderProxy(reader);
StreamAwareContentWriterProxy proxiedWriter = new StreamAwareContentWriterProxy(writer);
try
{
submittedTask = getExecutorService().submit(new TransformInternalCallable(proxiedReader, proxiedWriter, options));
submittedTask.get(timeoutMs + additionalThreadTimout, TimeUnit.MILLISECONDS);
}
catch (TimeoutException e)
{
releaseResources(submittedTask, proxiedReader, proxiedWriter);
throw new TimeoutException("Transformation failed due to timeout limit");
}
catch (InterruptedException e)
{
releaseResources(submittedTask, proxiedReader, proxiedWriter);
throw new InterruptedException("Transformation failed, because the thread of the transformation was interrupted");
}
catch (ExecutionException e)
{
Throwable cause = e.getCause();
if (cause instanceof TransformInternalCallableException)
{
cause = ((TransformInternalCallableException) cause).getCause();
}
throw cause;
}
}
// record time
long after = System.currentTimeMillis();
@@ -345,6 +414,31 @@ public abstract class AbstractContentTransformer2 extends AbstractContentTransfo
}
}
/**
* Cancels <code>task</code> and closes content accessors
*
* @param task - {@link Future} task instance which specifies a transformation action
* @param proxiedReader - {@link AbstractStreamAwareProxy} instance which represents channel closing mechanism for content reader
* @param proxiedWriter - {@link AbstractStreamAwareProxy} instance which represents channel closing mechanism for content writer
*/
private void releaseResources(Future<?> task, AbstractStreamAwareProxy proxiedReader, AbstractStreamAwareProxy proxiedWriter)
{
if (null != task)
{
task.cancel(true);
}
if (null != proxiedReader)
{
proxiedReader.release();
}
if (null != proxiedWriter)
{
proxiedWriter.release();
}
}
public final void transform(
ContentReader reader,
ContentWriter writer,
@@ -400,6 +494,103 @@ public abstract class AbstractContentTransformer2 extends AbstractContentTransfo
}
}
/**
* Gets the <code>ExecutorService</code> to be used for timeout-aware extraction.
* <p>
* If no <code>ExecutorService</code> has been defined a default of <code>Executors.newCachedThreadPool()</code> is used during {@link AbstractMappingMetadataExtracter#init()}.
*
* @return the defined or default <code>ExecutorService</code>
*/
protected ExecutorService getExecutorService()
{
if (null == executorService)
{
executorService = Executors.newCachedThreadPool();
}
return executorService;
}
/**
* Sets the <code>ExecutorService</code> to be used for timeout-aware transformation.
*
* @param executorService - {@link ExecutorService} instance for timeouts
*/
public void setExecutorService(ExecutorService executorService)
{
this.executorService = executorService;
}
/**
* {@link Callable} wrapper for the {@link AbstractContentTransformer2#transformInternal(ContentReader, ContentWriter, TransformationOptions)} method to handle timeouts.
*/
private class TransformInternalCallable implements Callable<Void>
{
private ContentReader reader;
private ContentWriter writer;
private TransformationOptions options;
public TransformInternalCallable(ContentReader reader, ContentWriter writer, TransformationOptions options)
{
this.reader = reader;
this.writer = writer;
this.options = options;
}
@Override
public Void call() throws Exception
{
try
{
transformInternal(reader, writer, options);
return null;
}
catch (Throwable e)
{
throw new TransformInternalCallableException(e);
}
}
}
/**
* Exception wrapper to handle any {@link Throwable} from {@link AbstractContentTransformer2#transformInternal(ContentReader, ContentWriter, TransformationOptions)}
*/
private class TransformInternalCallableException extends Exception
{
private static final long serialVersionUID = 7740560508772740658L;
public TransformInternalCallableException(Throwable cause)
{
super(cause);
}
}
/**
* @param useTimeoutThread - {@link Boolean} value which specifies timeout limiting mechanism for the current transformer
* @see AbstractContentTransformer2#useTimeoutThread
*/
public void setUseTimeoutThread(Boolean useTimeoutThread)
{
if (null == useTimeoutThread)
{
useTimeoutThread = true;
}
this.useTimeoutThread = useTimeoutThread;
}
public void setAdditionalThreadTimout(long additionalThreadTimout)
{
this.additionalThreadTimout = additionalThreadTimout;
}
public Boolean isTransformationLimitedInternally()
{
return useTimeoutThread;
}
/**
* Records an error and updates the average time as if the transformation took a
* long time, so that it is less likely to be called again.

View File

@@ -1,5 +1,5 @@
/*
* Copyright (C) 2005-2010 Alfresco Software Limited.
* Copyright (C) 2005-2014 Alfresco Software Limited.
*
* This file is part of Alfresco
*
@@ -48,6 +48,7 @@ public class PoiOOXMLContentTransformer extends TikaPoweredContentTransformer
public PoiOOXMLContentTransformer() {
super(SUPPORTED_MIMETYPES);
setUseTimeoutThread(true);
}
@Override

View File

@@ -1,5 +1,5 @@
/*
* Copyright (C) 2005-2010 Alfresco Software Limited.
* Copyright (C) 2005-2014 Alfresco Software Limited.
*
* This file is part of Alfresco
*
@@ -108,6 +108,7 @@ public class TikaAutoContentTransformer extends TikaPoweredContentTransformer
public TikaAutoContentTransformer(TikaConfig tikaConfig)
{
super( buildMimeTypes(tikaConfig) );
setUseTimeoutThread(true);
}
/**

View File

@@ -1,5 +1,5 @@
/*
* Copyright (C) 2005-2012 Alfresco Software Limited.
* Copyright (C) 2005-2014 Alfresco Software Limited.
*
* This file is part of Alfresco
*
@@ -32,14 +32,12 @@ import javax.xml.transform.sax.TransformerHandler;
import javax.xml.transform.stream.StreamResult;
import org.alfresco.repo.content.MimetypeMap;
import org.alfresco.repo.content.filestore.FileContentReader;
import org.alfresco.service.cmr.repository.ContentReader;
import org.alfresco.service.cmr.repository.ContentWriter;
import org.alfresco.service.cmr.repository.TransformationOptions;
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.apache.tika.extractor.DocumentSelector;
import org.apache.tika.io.TikaInputStream;
import org.apache.tika.metadata.Metadata;
import org.apache.tika.parser.ParseContext;
import org.apache.tika.parser.Parser;
@@ -69,6 +67,14 @@ public abstract class TikaPoweredContentTransformer extends AbstractContentTrans
MimetypeMap.MIMETYPE_XHTML,
MimetypeMap.MIMETYPE_XML});
private static final double MEGABYTES = 1024.0 * 1024.0;
private static final String USAGE_PATTERN = "Content transformation has completed:\n" +
" Transformer: %s\n" +
" Content Reader: %s\n" +
" Memory (MB): Used/Total/Maximum - %f/%f/%f\n" +
" Time Spent: %d ms";
protected List<String> sourceMimeTypes;
protected DocumentSelector documentSelector;
@@ -225,22 +231,24 @@ public abstract class TikaPoweredContentTransformer extends AbstractContentTrans
);
}
// Prefer the File if available - it takes less memory to process
InputStream is;
if(reader instanceof FileContentReader)
InputStream is = reader.getContentInputStream();
long startTime = 0;
try {
if (logger.isDebugEnabled())
{
is = TikaInputStream.get( ((FileContentReader)reader).getFile(), metadata );
}
else
{
is = reader.getContentInputStream();
startTime = System.currentTimeMillis();
}
try {
parser.parse(is, handler, metadata, context);
}
finally
{
if(logger.isDebugEnabled())
{
logger.debug(calculateMemoryAndTimeUsage(reader, startTime));
}
if (is != null)
{
try { is.close(); } catch (Throwable e) {}
@@ -255,4 +263,13 @@ public abstract class TikaPoweredContentTransformer extends AbstractContentTrans
}
}
}
private String calculateMemoryAndTimeUsage(ContentReader reader, long startTime)
{
long endTime = System.currentTimeMillis();
Runtime runtime = Runtime.getRuntime();
long totalMemory = runtime.totalMemory();
return String.format(USAGE_PATTERN, this.getClass().getName(), reader, (totalMemory - runtime.freeMemory()) / MEGABYTES, totalMemory / MEGABYTES, runtime.maxMemory()
/ MEGABYTES, (endTime - startTime));
}
}

View File

@@ -1,5 +1,5 @@
/*
* Copyright (C) 2005-2010 Alfresco Software Limited.
* Copyright (C) 2005-2014 Alfresco Software Limited.
*
* This file is part of Alfresco
*
@@ -35,9 +35,30 @@ import org.alfresco.service.namespace.QName;
* @see org.alfresco.repo.content.metadata.PoiMetadataExtracter
*
* @author Neil McErlean
* @author Dmitry Velichkevich
*/
public class PoiMetadataExtracterTest extends AbstractMetadataExtracterTest
{
private static final int MINIMAL_EXPECTED_PROPERTIES_AMOUNT = 3;
private static final int IGNORABLE_TIMEOUT = -1;
// private static final int TIMEOUT_FOR_QUICK_EXTRACTION = 2000;
private static final int DEFAULT_FOOTNOTES_LIMIT = 50;
private static final int LARGE_FOOTNOTES_LIMIT = 25000;
private static final String ALL_MIMETYPES_FILTER = "*";
private static final String PROBLEM_FOOTNOTES_DOCUMENT_NAME = "problemFootnotes2.docx";
// private static final String PROBLEM_SLIDE_SHOW_DOCUMENT_NAME = "problemSlideShow.pptx";
private static final String EXTRACTOR_POI_BEAN_NAME = "extracter.Poi";
private PoiMetadataExtracter extracter;
@Override
@@ -46,9 +67,31 @@ public class PoiMetadataExtracterTest extends AbstractMetadataExtracterTest
super.setUp();
extracter = new PoiMetadataExtracter();
extracter.setDictionaryService(dictionaryService);
resetPoiConfigurationToDefault();
extracter.register();
}
@Override
protected void tearDown() throws Exception
{
resetPoiConfigurationToDefault();
super.tearDown();
}
/**
* Resets POI library configuration to default. Sets allowable XSLF relationship types and footnotes limit as per 'extracter.Poi' bean configuration
*
* @throws Exception
*/
private void resetPoiConfigurationToDefault() throws Exception
{
PoiMetadataExtracter configuredExtractor = (PoiMetadataExtracter) ctx.getBean(EXTRACTOR_POI_BEAN_NAME);
extracter.setPoiExtractPropertiesOnly(true);
extracter.setPoiFootnotesLimit(DEFAULT_FOOTNOTES_LIMIT);
extracter.setPoiAllowableXslfRelationshipTypes(configuredExtractor.getPoiAllowableXslfRelationshipTypes());
extracter.afterPropertiesSet();
}
@Override
protected MetadataExtracter getExtracter()
{
@@ -123,7 +166,7 @@ public class PoiMetadataExtracterTest extends AbstractMetadataExtracterTest
limits.setTimeoutMs(timeoutMs);
HashMap<String, MetadataExtracterLimits> mimetypeLimits =
new HashMap<String, MetadataExtracterLimits>(1);
mimetypeLimits.put("*", limits);
mimetypeLimits.put(ALL_MIMETYPES_FILTER, limits);
((PoiMetadataExtracter) getExtracter()).setMimetypeLimits(mimetypeLimits);
File sourceFile = AbstractContentTransformerTest.loadNamedQuickTestFile("problemFootnotes.docx");
@@ -144,4 +187,100 @@ public class PoiMetadataExtracterTest extends AbstractMetadataExtracterTest
extractionTime < (timeoutMs + 100)); // bit of wiggle room for logging, cleanup, etc.
assertFalse("Reader was not closed", sourceReader.isChannelOpen());
}
// /**
// * Test for MNT-11823: Upload of PPTX causes very high memory usage leading to system instability
// *
// * @throws Exception
// */
// public void testProblemSlideShow() throws Exception
// {
// PoiMetadataExtracter extractor = (PoiMetadataExtracter) getExtracter();
// configureExtractorLimits(extractor, ALL_MIMETYPES_FILTER, TIMEOUT_FOR_QUICK_EXTRACTION);
//
// File problemSlideShowFile = AbstractContentTransformerTest.loadNamedQuickTestFile(PROBLEM_SLIDE_SHOW_DOCUMENT_NAME);
// ContentReader sourceReader = new FileContentReader(problemSlideShowFile);
// sourceReader.setMimetype(MimetypeMap.MIMETYPE_OPENXML_PRESENTATION);
//
// Map<QName, Serializable> properties = new HashMap<QName, Serializable>();
// extractor.extract(sourceReader, properties);
//
// assertExtractedProperties(properties);
// assertFalse("Reader was not closed", sourceReader.isChannelOpen());
//
// extractor.setPoiExtractPropertiesOnly(false);
// extractor.afterPropertiesSet();
// properties = new HashMap<QName, Serializable>();
// extractor.extract(sourceReader, properties);
//
// assertFalse("Reader was not closed", sourceReader.isChannelOpen());
// assertTrue(("Extraction completed successfully but failure is expected! Invalid properties are: " + properties), (null == properties) || properties.isEmpty());
// }
/**
* Configures timeout for given <code>extractor</code> and <code>mimetypeFilter</code>
*
* @param extractor - {@link PoiMetadataExtracter} instance
* @param mimetypeFilter - {@link String} value which specifies mimetype filter for which timeout should be applied
* @param timeout - {@link Long} value which specifies timeout for <code>mimetypeFilter</code>
*/
private void configureExtractorLimits(PoiMetadataExtracter extractor, String mimetypeFilter, long timeout)
{
MetadataExtracterLimits limits = new MetadataExtracterLimits();
limits.setTimeoutMs(timeout);
HashMap<String, MetadataExtracterLimits> mimetypeLimits = new HashMap<String, MetadataExtracterLimits>(1);
mimetypeLimits.put(mimetypeFilter, limits);
extractor.setMimetypeLimits(mimetypeLimits);
}
/**
* Test for MNT-577: Alfresco is running 100% CPU for over 10 minutes while extracting metadata for Word office document
*
* @throws Exception
*/
public void testFootnotesLimitParameterUsing() throws Exception
{
PoiMetadataExtracter extractor = (PoiMetadataExtracter) getExtracter();
File sourceFile = AbstractContentTransformerTest.loadNamedQuickTestFile(PROBLEM_FOOTNOTES_DOCUMENT_NAME);
ContentReader sourceReader = new FileContentReader(sourceFile);
sourceReader.setMimetype(MimetypeMap.MIMETYPE_OPENXML_WORDPROCESSING);
Map<QName, Serializable> properties = new HashMap<QName, Serializable>();
long startTime = System.currentTimeMillis();
extractor.extract(sourceReader, properties);
long extractionTimeWithDefaultFootnotesLimit = System.currentTimeMillis() - startTime;
assertExtractedProperties(properties);
assertFalse("Reader was not closed", sourceReader.isChannelOpen());
// Just let the extractor do the job...
configureExtractorLimits(extractor, ALL_MIMETYPES_FILTER, IGNORABLE_TIMEOUT);
extractor.setPoiFootnotesLimit(LARGE_FOOTNOTES_LIMIT);
extractor.afterPropertiesSet();
properties = new HashMap<QName, Serializable>();
startTime = System.currentTimeMillis();
extractor.extract(sourceReader, properties);
long extractionTimeWithLargeFootnotesLimit = System.currentTimeMillis() - startTime;
assertExtractedProperties(properties);
assertTrue("The second metadata extraction operation must be longer!", extractionTimeWithLargeFootnotesLimit > extractionTimeWithDefaultFootnotesLimit);
assertFalse("Reader was not closed", sourceReader.isChannelOpen());
}
/**
* Asserts extracted <code>properties</code>. At least {@link PoiMetadataExtracterTest#MINIMAL_EXPECTED_PROPERTIES_AMOUNT} properties are expected:
* {@link ContentModel#PROP_TITLE}, {@link ContentModel#PROP_AUTHOR} and {@link ContentModel#PROP_CREATED}
*
* @param properties - {@link Map}&lt;{@link QName}, {@link Serializable}&gt; instance which contains all extracted properties
*/
private void assertExtractedProperties(Map<QName, Serializable> properties)
{
assertNotNull("Properties were not extracted at all!", properties);
assertFalse("Extracted properties are empty!", properties.isEmpty());
assertTrue(("Expected 3 extracted properties but only " + properties.size() + " have been extracted!"), properties.size() >= MINIMAL_EXPECTED_PROPERTIES_AMOUNT);
assertTrue(("'" + ContentModel.PROP_TITLE + "' property is missing!"), properties.containsKey(ContentModel.PROP_TITLE));
assertTrue(("'" + ContentModel.PROP_AUTHOR + "' property is missing!"), properties.containsKey(ContentModel.PROP_AUTHOR));
assertTrue(("'" + ContentModel.PROP_CREATED + "' property is missing!"), properties.containsKey(ContentModel.PROP_CREATED));
}
}

View File

@@ -1,5 +1,5 @@
/*
* Copyright (C) 2005-2011 Alfresco Software Limited.
* Copyright (C) 2005-2014 Alfresco Software Limited.
*
* This file is part of Alfresco
*
@@ -18,16 +18,40 @@
*/
package org.alfresco.repo.content.transform;
import java.io.File;
import java.util.concurrent.TimeoutException;
import org.alfresco.repo.content.MimetypeMap;
import org.alfresco.repo.content.filestore.FileContentReader;
import org.alfresco.repo.security.authentication.AuthenticationUtil;
import org.alfresco.repo.security.authentication.AuthenticationUtil.RunAsWork;
import org.alfresco.service.cmr.repository.ContentIOException;
import org.alfresco.service.cmr.repository.ContentReader;
import org.alfresco.service.cmr.repository.ContentService;
import org.alfresco.service.cmr.repository.ContentWriter;
import org.alfresco.service.cmr.repository.TransformationOptionLimits;
import org.alfresco.service.cmr.repository.TransformationOptions;
/**
* @see org.alfresco.repo.content.transform.PoiOOXMLContentTransformer
*
* @author Nick Burch
* @author Dmitry Velichkevich
*/
public class PoiOOXMLContentTransformerTest extends AbstractContentTransformerTest
{
private static final int SMALL_TIMEOUT = 50;
private static final int ADDITIONAL_PROCESSING_TIME = 1500;
private static final String ENCODING_UTF_8 = "UTF-8";
private static final String TEST_PPTX_FILE_NAME = "quickImg2.pptx";
private ContentService contentService;
private PoiOOXMLContentTransformer transformer;
@Override
@@ -39,6 +63,8 @@ public class PoiOOXMLContentTransformerTest extends AbstractContentTransformerTe
transformer.setMimetypeService(mimetypeService);
transformer.setTransformerDebug(transformerDebug);
transformer.setTransformerConfig(transformerConfig);
contentService = serviceRegistry.getContentService();
}
/**
@@ -66,4 +92,92 @@ public class PoiOOXMLContentTransformerTest extends AbstractContentTransformerTe
assertTrue(transformer.isTransformable(MimetypeMap.MIMETYPE_OPENXML_SPREADSHEET, -1, MimetypeMap.MIMETYPE_HTML, new TransformationOptions()));
assertTrue(transformer.isTransformable(MimetypeMap.MIMETYPE_OPENXML_SPREADSHEET, -1, MimetypeMap.MIMETYPE_XML, new TransformationOptions()));
}
/**
* MNT-12043: CLONE - Upload of PPTX causes very high memory usage leading to system instability
*
* @throws Exception
*/
public void testMnt12043() throws Exception
{
transformer.setMimetypeService(mimetypeService);
transformer.setAdditionalThreadTimout(0);
configureExtractorLimits(transformer, SMALL_TIMEOUT);
File sourceFile = AbstractContentTransformerTest.loadNamedQuickTestFile(TEST_PPTX_FILE_NAME);
ContentReader sourceReader = new FileContentReader(sourceFile)
{
@Override
public void setLimits(TransformationOptionLimits limits)
{
// Test without content reader input stream timeout limits
}
};
sourceReader.setMimetype(MimetypeMap.MIMETYPE_OPENXML_PRESENTATION);
ContentWriter tempWriter = AuthenticationUtil.runAs(new RunAsWork<ContentWriter>()
{
@Override
public ContentWriter doWork() throws Exception
{
ContentWriter result = contentService.getTempWriter();
result.setEncoding(ENCODING_UTF_8);
result.setMimetype(MimetypeMap.MIMETYPE_TEXT_PLAIN);
return result;
}
}, AuthenticationUtil.getAdminUserName());
long startTime = System.currentTimeMillis();
try
{
transformer.transform(sourceReader, tempWriter);
long transformationTime = System.currentTimeMillis() - startTime;
fail("Content transformation took " + transformationTime + " ms, but should have failed with a timeout at " + SMALL_TIMEOUT + " ms");
}
catch (ContentIOException e)
{
long transformationTime = System.currentTimeMillis() - startTime;
assertTrue((TimeoutException.class.getName() + " exception is expected as the cause of transformation failure"), e.getCause() instanceof TimeoutException);
// Not sure we can have the following assert as we may have introduced an intermittent test failure. Already seen a time of 1009ms
assertTrue(("Failed content transformation took " + transformationTime + " ms, but should have failed with a timeout at " + SMALL_TIMEOUT + " ms"),
transformationTime <= (SMALL_TIMEOUT + ADDITIONAL_PROCESSING_TIME));
}
assertFalse("Readable channel was not closed after transformation attempt!", sourceReader.isChannelOpen());
assertFalse("Writable channel was not closed after transformation attempt!", tempWriter.isChannelOpen());
}
/**
* Configures timeout for given <code>transformer</code>
*
* @param extractor - {@link PoiOOXMLContentTransformer} instance
* @param timeout - {@link Long} value which specifies timeout for <code>transformer</code>
*/
private void configureExtractorLimits(PoiOOXMLContentTransformer transformer, final long timeout)
{
transformer.setTransformerConfig(new TransformerConfigImpl()
{
@Override
public TransformationOptionLimits getLimits(ContentTransformer transformer, String sourceMimetype, String targetMimetype, String use)
{
TransformationOptionLimits result = new TransformationOptionLimits();
result.setTimeoutMs(timeout);
return result;
}
@Override
public TransformerStatistics getStatistics(ContentTransformer transformer, String sourceMimetype, String targetMimetype, boolean createNew)
{
return transformerConfig.getStatistics(transformer, sourceMimetype, targetMimetype, createNew);
}
@Override
public boolean isSupportedTransformation(ContentTransformer transformer, String sourceMimetype, String targetMimetype, TransformationOptions options)
{
return transformerConfig.isSupportedTransformation(transformer, sourceMimetype, targetMimetype, options);
}
});
}
}

Binary file not shown.