mirror of
https://github.com/Alfresco/alfresco-community-repo.git
synced 2025-07-31 17:39:05 +00:00
Merged HEAD-BUG-FIX (5.0/Cloud) to HEAD (5.0/Cloud)
84058: Merged V4.2-BUG-FIX (4.2.4) to HEAD-BUG-FIX (5.0/Cloud) 83799: MNT-12238: Merged DEV 4.2-BUG-FIX (4.2.4) to V4.2-BUG-FIX (4.2.4) MNT-12238: Merged 4.1-BUG-FIX (4.1.10) to V4.2-BUG-FIX (4.2.4) 80291: Merged V4.1.6 (4.1.6.21) to V4.1-BUG-FIX (4.1.10) 77378: Merged DEV PATCHES/V4.1.6 (19) to PATCHES/V4.1.6 (20) 76649: MNT-11823: Upload of PPTX causes very high memory usage leading to system instability - Patch from MNT-577 has been combined with new changes to avoid hanging of analyzing complicated PPTX documents. The fix just disables reading the entire contents of the complicated document. POI metadata extractor may be switched to standard behavior or reconfigured, using the following new properties: content.transformer.Poi.poiFootnotesLimit, content.transformer.Poi.poiExtractPropertiesOnly and content-services-context.xml/extracter.Poi/poiAllowableXslfRelationshipTypes 77379: MNT-11823: Upload of PPTX causes very high memory usage leading to system instability Test and the test data for MNT-577 have been added. Test for MNT-11823 has also been added. But this test is commented because the test data (appropriate PPTX document) is not currently available. Getters for POI specific properties have been added to 'PoiMetadataExtracter' for tests. Also 'afterPropertiesSet()' logic has been a bit modified to allow setting 'false' value for 'poiExtractPropertiesOnly' parameter 77561: MNT-11823: Upload of PPTX causes very high memory usage leading to system instability Fix for https://bamboo.alfresco.com/bamboo/browse/HF-PATCH416-126 build failure. POI extractor and transformer properties of 'AlfrescoPoiPatchUtils' have been isolated from each other using context. Each extractor or transformer now has its own context or uses the default context. Properties of the default context allow parsing the entire contents of XLSF documents. And footnotes limit is 50. Property names have not been changed, but currently 'content-services-context.xml/extracter.Poi/poiAllowableXslfRelationshipTypes=null' does not lead to 'content.transformer.Poi.poiExtractPropertiesOnly=false'. I. e., this list may be empty. 'PoiMetadataExtracterTest' test has been modified in accordance with the introduced changes. 'poi-OOXML-3.9-beta1-20121109.jar' has been renamed to 'poi-OOXML-3.9-beta1-20121109-patched.jar' 79180: MNT-12043: CLONE - Upload of PPTX causes very high memory usage leading to system instability Timeout mechanism has been added to content transformers. Timeout configuration options have been added. Also mechanism to close streams after 'TimoutException' has been added to transformers and metadata extractors. Also timeout mechanism for input streams has been enabled in 'TikaPoweredContentTransformer' 79268: MNT-12043: CLONE - Upload of PPTX causes very high memory usage leading to system instability Fix for the https://bamboo.alfresco.com/bamboo/browse/HF-PATCH416-133 build failure and comments of the review https://fisheye.alfresco.com/cru/CR-100#CFR-1184. The new test has been added into 'PoiOOXMLContentTransformerTest.testMnt12043()' to check out the newly added timeout mechanism 79290: MNT-12043: CLONE - Upload of PPTX causes very high memory usage leading to system instability - Removed methods and properties that are no longer needed 79327: MNT-12043: CLONE - Upload of PPTX causes very high memory usage leading to system instability - Increased ADDITIONAL_PROCESSING_TIME to 1500ms to try and avoid a new intermittent test failure. 83885: MNT-12238 Bring Maven POM file in sync with latest patched version of poi-ooxml git-svn-id: https://svn.alfresco.com/repos/alfresco-enterprise/alfresco/HEAD/root@84627 c4b6b30b-aa2e-2d43-bbcb-ca4b014f7261
This commit is contained in:
@@ -319,7 +319,17 @@
|
|||||||
<bean id="extracter.PDFBox" class="org.alfresco.repo.content.metadata.PdfBoxMetadataExtracter" parent="baseMetadataExtracter">
|
<bean id="extracter.PDFBox" class="org.alfresco.repo.content.metadata.PdfBoxMetadataExtracter" parent="baseMetadataExtracter">
|
||||||
<property name="documentSelector" ref="pdfBoxEmbededDocumentSelector" />
|
<property name="documentSelector" ref="pdfBoxEmbededDocumentSelector" />
|
||||||
</bean>
|
</bean>
|
||||||
<bean id="extracter.Poi" class="org.alfresco.repo.content.metadata.PoiMetadataExtracter" parent="baseMetadataExtracter" />
|
<bean id="extracter.Poi" class="org.alfresco.repo.content.metadata.PoiMetadataExtracter" parent="baseMetadataExtracter">
|
||||||
|
<property name="poiFootnotesLimit" value="${content.transformer.Poi.poiFootnotesLimit}" />
|
||||||
|
<property name="poiExtractPropertiesOnly" value="${content.transformer.Poi.poiExtractPropertiesOnly}" />
|
||||||
|
<property name="poiAllowableXslfRelationshipTypes">
|
||||||
|
<list>
|
||||||
|
<!-- These values are valid for Office 2007, 2010 and 2013 -->
|
||||||
|
<value>http://schemas.openxmlformats.org/officeDocument/2006/relationships/presProps</value>
|
||||||
|
<value>http://schemas.openxmlformats.org/officeDocument/2006/relationships/viewProps</value>
|
||||||
|
</list>
|
||||||
|
</property>
|
||||||
|
</bean>
|
||||||
<bean id="extracter.Office" class="org.alfresco.repo.content.metadata.OfficeMetadataExtracter" parent="baseMetadataExtracter" />
|
<bean id="extracter.Office" class="org.alfresco.repo.content.metadata.OfficeMetadataExtracter" parent="baseMetadataExtracter" />
|
||||||
<bean id="extracter.Mail" class="org.alfresco.repo.content.metadata.MailMetadataExtracter" parent="baseMetadataExtracter" />
|
<bean id="extracter.Mail" class="org.alfresco.repo.content.metadata.MailMetadataExtracter" parent="baseMetadataExtracter" />
|
||||||
<bean id="extracter.Html" class="org.alfresco.repo.content.metadata.HtmlMetadataExtracter" parent="baseMetadataExtracter" />
|
<bean id="extracter.Html" class="org.alfresco.repo.content.metadata.HtmlMetadataExtracter" parent="baseMetadataExtracter" />
|
||||||
|
@@ -670,6 +670,12 @@ system.thumbnail.quietPeriod=604800
|
|||||||
system.thumbnail.quietPeriodRetriesEnabled=true
|
system.thumbnail.quietPeriodRetriesEnabled=true
|
||||||
system.thumbnail.redeployStaticDefsOnStartup=true
|
system.thumbnail.redeployStaticDefsOnStartup=true
|
||||||
|
|
||||||
|
# MNT-11823: Limit for read-only footnotes list size and an indication to
|
||||||
|
# extract only the properties from XSLF documents without reading the
|
||||||
|
# entire contents of the document
|
||||||
|
content.transformer.Poi.poiFootnotesLimit=50
|
||||||
|
content.transformer.Poi.poiExtractPropertiesOnly=true
|
||||||
|
|
||||||
# The default timeout for metadata mapping extracters
|
# The default timeout for metadata mapping extracters
|
||||||
content.metadataExtracter.default.timeoutMs=20000
|
content.metadataExtracter.default.timeoutMs=20000
|
||||||
|
|
||||||
|
2
pom.xml
2
pom.xml
@@ -504,7 +504,7 @@
|
|||||||
<dependency>
|
<dependency>
|
||||||
<groupId>org.apache.poi</groupId>
|
<groupId>org.apache.poi</groupId>
|
||||||
<artifactId>poi-ooxml</artifactId>
|
<artifactId>poi-ooxml</artifactId>
|
||||||
<version>${dependency.poi.version}</version>
|
<version>3.10-FINAL-20140910-alfresco-patched</version>
|
||||||
</dependency>
|
</dependency>
|
||||||
<dependency>
|
<dependency>
|
||||||
<groupId>org.apache.poi</groupId>
|
<groupId>org.apache.poi</groupId>
|
||||||
|
@@ -0,0 +1,62 @@
|
|||||||
|
/*
|
||||||
|
* Copyright (C) 2005-2014 Alfresco Software Limited.
|
||||||
|
*
|
||||||
|
* This file is part of Alfresco
|
||||||
|
*
|
||||||
|
* Alfresco is free software: you can redistribute it and/or modify
|
||||||
|
* it under the terms of the GNU Lesser General Public License as published by
|
||||||
|
* the Free Software Foundation, either version 3 of the License, or
|
||||||
|
* (at your option) any later version.
|
||||||
|
*
|
||||||
|
* Alfresco is distributed in the hope that it will be useful,
|
||||||
|
* but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||||
|
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||||
|
* GNU Lesser General Public License for more details.
|
||||||
|
*
|
||||||
|
* You should have received a copy of the GNU Lesser General Public License
|
||||||
|
* along with Alfresco. If not, see <http://www.gnu.org/licenses/>.
|
||||||
|
*/
|
||||||
|
package org.alfresco.repo.content;
|
||||||
|
|
||||||
|
import java.io.Closeable;
|
||||||
|
import java.io.IOException;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Base class for stream aware proxies
|
||||||
|
*
|
||||||
|
* @author Dmitry Velichkevich
|
||||||
|
*/
|
||||||
|
public abstract class AbstractStreamAwareProxy
|
||||||
|
{
|
||||||
|
/**
|
||||||
|
* @return {@link Closeable} instance which represents channel or stream which uses channel
|
||||||
|
*/
|
||||||
|
protected abstract Closeable getStream();
|
||||||
|
|
||||||
|
/**
|
||||||
|
* @return {@link Boolean} value which determines whether stream can (<code>true</code>) or cannot ((<code>false</code>)) be closed
|
||||||
|
*/
|
||||||
|
protected abstract boolean canBeClosed();
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Encapsulates the logic of releasing the captured stream or channel. It is expected that each resource object shares the same channel
|
||||||
|
*/
|
||||||
|
public void release()
|
||||||
|
{
|
||||||
|
Closeable stream = getStream();
|
||||||
|
|
||||||
|
if ((null == stream) || !canBeClosed())
|
||||||
|
{
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
try
|
||||||
|
{
|
||||||
|
stream.close();
|
||||||
|
}
|
||||||
|
catch (IOException e)
|
||||||
|
{
|
||||||
|
throw new RuntimeException("Failed to close stream!", e);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
@@ -0,0 +1,216 @@
|
|||||||
|
/*
|
||||||
|
* Copyright (C) 2005-2014 Alfresco Software Limited.
|
||||||
|
*
|
||||||
|
* This file is part of Alfresco
|
||||||
|
*
|
||||||
|
* Alfresco is free software: you can redistribute it and/or modify
|
||||||
|
* it under the terms of the GNU Lesser General Public License as published by
|
||||||
|
* the Free Software Foundation, either version 3 of the License, or
|
||||||
|
* (at your option) any later version.
|
||||||
|
*
|
||||||
|
* Alfresco is distributed in the hope that it will be useful,
|
||||||
|
* but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||||
|
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||||
|
* GNU Lesser General Public License for more details.
|
||||||
|
*
|
||||||
|
* You should have received a copy of the GNU Lesser General Public License
|
||||||
|
* along with Alfresco. If not, see <http://www.gnu.org/licenses/>.
|
||||||
|
*/
|
||||||
|
package org.alfresco.repo.content;
|
||||||
|
|
||||||
|
import java.io.Closeable;
|
||||||
|
import java.io.File;
|
||||||
|
import java.io.InputStream;
|
||||||
|
import java.io.OutputStream;
|
||||||
|
import java.nio.channels.FileChannel;
|
||||||
|
import java.nio.channels.ReadableByteChannel;
|
||||||
|
import java.util.Locale;
|
||||||
|
|
||||||
|
import org.alfresco.service.cmr.repository.ContentData;
|
||||||
|
import org.alfresco.service.cmr.repository.ContentIOException;
|
||||||
|
import org.alfresco.service.cmr.repository.ContentReader;
|
||||||
|
import org.alfresco.service.cmr.repository.ContentStreamListener;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Proxy for {@link ContentReader} which captures {@link InputStream} or {@link ReadableByteChannel} to introduce a possibility releasing captured resource
|
||||||
|
*
|
||||||
|
* @author Dmitry Velichkevich
|
||||||
|
* @see ContentReader
|
||||||
|
* @see AbstractStreamAwareProxy
|
||||||
|
*/
|
||||||
|
public class StreamAwareContentReaderProxy extends AbstractStreamAwareProxy implements ContentReader
|
||||||
|
{
|
||||||
|
private ContentReader delegatee;
|
||||||
|
|
||||||
|
private Closeable releaseableResource;
|
||||||
|
|
||||||
|
public StreamAwareContentReaderProxy(ContentReader delegator)
|
||||||
|
{
|
||||||
|
this.delegatee = delegator;
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public boolean exists()
|
||||||
|
{
|
||||||
|
return delegatee.exists();
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public void getContent(OutputStream os) throws ContentIOException
|
||||||
|
{
|
||||||
|
delegatee.getContent(os);
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public void getContent(File file) throws ContentIOException
|
||||||
|
{
|
||||||
|
delegatee.getContent(file);
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public InputStream getContentInputStream() throws ContentIOException
|
||||||
|
{
|
||||||
|
InputStream result = delegatee.getContentInputStream();
|
||||||
|
|
||||||
|
if (null == releaseableResource)
|
||||||
|
{
|
||||||
|
releaseableResource = result;
|
||||||
|
}
|
||||||
|
|
||||||
|
return result;
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public String getContentString() throws ContentIOException
|
||||||
|
{
|
||||||
|
return delegatee.getContentString();
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public String getContentString(int length) throws ContentIOException
|
||||||
|
{
|
||||||
|
return delegatee.getContentString(length);
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public FileChannel getFileChannel() throws ContentIOException
|
||||||
|
{
|
||||||
|
FileChannel result = delegatee.getFileChannel();
|
||||||
|
|
||||||
|
if (null == releaseableResource)
|
||||||
|
{
|
||||||
|
releaseableResource = result;
|
||||||
|
}
|
||||||
|
|
||||||
|
return result;
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public long getLastModified()
|
||||||
|
{
|
||||||
|
return delegatee.getLastModified();
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public ReadableByteChannel getReadableChannel() throws ContentIOException
|
||||||
|
{
|
||||||
|
ReadableByteChannel result = delegatee.getReadableChannel();
|
||||||
|
|
||||||
|
if (null == releaseableResource)
|
||||||
|
{
|
||||||
|
releaseableResource = result;
|
||||||
|
}
|
||||||
|
|
||||||
|
return result;
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public ContentReader getReader() throws ContentIOException
|
||||||
|
{
|
||||||
|
return delegatee.getReader();
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public boolean isClosed()
|
||||||
|
{
|
||||||
|
return delegatee.isClosed();
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public void addListener(ContentStreamListener listener)
|
||||||
|
{
|
||||||
|
delegatee.addListener(listener);
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public ContentData getContentData()
|
||||||
|
{
|
||||||
|
return delegatee.getContentData();
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public String getContentUrl()
|
||||||
|
{
|
||||||
|
return delegatee.getContentUrl();
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public String getEncoding()
|
||||||
|
{
|
||||||
|
return delegatee.getEncoding();
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public Locale getLocale()
|
||||||
|
{
|
||||||
|
return delegatee.getLocale();
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public String getMimetype()
|
||||||
|
{
|
||||||
|
return delegatee.getMimetype();
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public long getSize()
|
||||||
|
{
|
||||||
|
return delegatee.getSize();
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public boolean isChannelOpen()
|
||||||
|
{
|
||||||
|
return delegatee.isChannelOpen();
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public void setEncoding(String encoding)
|
||||||
|
{
|
||||||
|
delegatee.setEncoding(encoding);
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public void setLocale(Locale locale)
|
||||||
|
{
|
||||||
|
delegatee.setLocale(locale);
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public void setMimetype(String mimetype)
|
||||||
|
{
|
||||||
|
delegatee.setMimetype(mimetype);
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public boolean canBeClosed()
|
||||||
|
{
|
||||||
|
return delegatee.isChannelOpen();
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public Closeable getStream()
|
||||||
|
{
|
||||||
|
return releaseableResource;
|
||||||
|
}
|
||||||
|
}
|
@@ -0,0 +1,217 @@
|
|||||||
|
/*
|
||||||
|
* Copyright (C) 2005-2014 Alfresco Software Limited.
|
||||||
|
*
|
||||||
|
* This file is part of Alfresco
|
||||||
|
*
|
||||||
|
* Alfresco is free software: you can redistribute it and/or modify
|
||||||
|
* it under the terms of the GNU Lesser General Public License as published by
|
||||||
|
* the Free Software Foundation, either version 3 of the License, or
|
||||||
|
* (at your option) any later version.
|
||||||
|
*
|
||||||
|
* Alfresco is distributed in the hope that it will be useful,
|
||||||
|
* but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||||
|
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||||
|
* GNU Lesser General Public License for more details.
|
||||||
|
*
|
||||||
|
* You should have received a copy of the GNU Lesser General Public License
|
||||||
|
* along with Alfresco. If not, see <http://www.gnu.org/licenses/>.
|
||||||
|
*/
|
||||||
|
package org.alfresco.repo.content;
|
||||||
|
|
||||||
|
import java.io.Closeable;
|
||||||
|
import java.io.File;
|
||||||
|
import java.io.InputStream;
|
||||||
|
import java.io.OutputStream;
|
||||||
|
import java.nio.channels.FileChannel;
|
||||||
|
import java.nio.channels.WritableByteChannel;
|
||||||
|
import java.util.Locale;
|
||||||
|
|
||||||
|
import org.alfresco.service.cmr.repository.ContentData;
|
||||||
|
import org.alfresco.service.cmr.repository.ContentIOException;
|
||||||
|
import org.alfresco.service.cmr.repository.ContentReader;
|
||||||
|
import org.alfresco.service.cmr.repository.ContentStreamListener;
|
||||||
|
import org.alfresco.service.cmr.repository.ContentWriter;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Proxy for {@link ContentWriter} which captures {@link OutputStream} or {@link WritableByteChannel} to introduce a possibility of releasing captured resource
|
||||||
|
*
|
||||||
|
* @author Dmitry Velichkevich
|
||||||
|
* @see ContentWriter
|
||||||
|
* @see AbstractStreamAwareProxy
|
||||||
|
*/
|
||||||
|
public class StreamAwareContentWriterProxy extends AbstractStreamAwareProxy implements ContentWriter
|
||||||
|
{
|
||||||
|
private ContentWriter delegatee;
|
||||||
|
|
||||||
|
private Closeable releaseableResource;
|
||||||
|
|
||||||
|
public StreamAwareContentWriterProxy(ContentWriter delegator)
|
||||||
|
{
|
||||||
|
this.delegatee = delegator;
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public OutputStream getContentOutputStream() throws ContentIOException
|
||||||
|
{
|
||||||
|
OutputStream result = delegatee.getContentOutputStream();
|
||||||
|
|
||||||
|
if (null == releaseableResource)
|
||||||
|
{
|
||||||
|
releaseableResource = result;
|
||||||
|
}
|
||||||
|
|
||||||
|
return result;
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public FileChannel getFileChannel(boolean truncate) throws ContentIOException
|
||||||
|
{
|
||||||
|
FileChannel result = delegatee.getFileChannel(truncate);
|
||||||
|
|
||||||
|
if (null == releaseableResource)
|
||||||
|
{
|
||||||
|
releaseableResource = result;
|
||||||
|
}
|
||||||
|
|
||||||
|
return result;
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public ContentReader getReader() throws ContentIOException
|
||||||
|
{
|
||||||
|
return delegatee.getReader();
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public WritableByteChannel getWritableChannel() throws ContentIOException
|
||||||
|
{
|
||||||
|
WritableByteChannel result = delegatee.getWritableChannel();
|
||||||
|
|
||||||
|
if (null == releaseableResource)
|
||||||
|
{
|
||||||
|
releaseableResource = result;
|
||||||
|
}
|
||||||
|
|
||||||
|
return result;
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public void guessEncoding()
|
||||||
|
{
|
||||||
|
delegatee.guessEncoding();
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public void guessMimetype(String filename)
|
||||||
|
{
|
||||||
|
delegatee.guessMimetype(filename);
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public boolean isClosed()
|
||||||
|
{
|
||||||
|
return delegatee.isClosed();
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public void putContent(ContentReader reader) throws ContentIOException
|
||||||
|
{
|
||||||
|
delegatee.putContent(reader);
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public void putContent(InputStream is) throws ContentIOException
|
||||||
|
{
|
||||||
|
delegatee.putContent(is);
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public void putContent(File file) throws ContentIOException
|
||||||
|
{
|
||||||
|
delegatee.putContent(file);
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public void putContent(String content) throws ContentIOException
|
||||||
|
{
|
||||||
|
delegatee.putContent(content);
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public void addListener(ContentStreamListener listener)
|
||||||
|
{
|
||||||
|
delegatee.addListener(listener);
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public ContentData getContentData()
|
||||||
|
{
|
||||||
|
return delegatee.getContentData();
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public String getContentUrl()
|
||||||
|
{
|
||||||
|
return delegatee.getContentUrl();
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public String getEncoding()
|
||||||
|
{
|
||||||
|
return delegatee.getEncoding();
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public Locale getLocale()
|
||||||
|
{
|
||||||
|
return delegatee.getLocale();
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public String getMimetype()
|
||||||
|
{
|
||||||
|
return delegatee.getMimetype();
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public long getSize()
|
||||||
|
{
|
||||||
|
return delegatee.getSize();
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public boolean isChannelOpen()
|
||||||
|
{
|
||||||
|
return delegatee.isChannelOpen();
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public void setEncoding(String encoding)
|
||||||
|
{
|
||||||
|
delegatee.setEncoding(encoding);
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public void setLocale(Locale locale)
|
||||||
|
{
|
||||||
|
delegatee.setLocale(locale);
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public void setMimetype(String mimetype)
|
||||||
|
{
|
||||||
|
delegatee.setMimetype(mimetype);
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public boolean canBeClosed()
|
||||||
|
{
|
||||||
|
return delegatee.isChannelOpen();
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public Closeable getStream()
|
||||||
|
{
|
||||||
|
return releaseableResource;
|
||||||
|
}
|
||||||
|
}
|
@@ -1,5 +1,5 @@
|
|||||||
/*
|
/*
|
||||||
* Copyright (C) 2005-2013 Alfresco Software Limited.
|
* Copyright (C) 2005-2014 Alfresco Software Limited.
|
||||||
*
|
*
|
||||||
* This file is part of Alfresco
|
* This file is part of Alfresco
|
||||||
*
|
*
|
||||||
@@ -44,6 +44,7 @@ import java.util.concurrent.TimeoutException;
|
|||||||
|
|
||||||
import org.alfresco.error.AlfrescoRuntimeException;
|
import org.alfresco.error.AlfrescoRuntimeException;
|
||||||
import org.alfresco.model.ContentModel;
|
import org.alfresco.model.ContentModel;
|
||||||
|
import org.alfresco.repo.content.StreamAwareContentReaderProxy;
|
||||||
import org.alfresco.service.cmr.dictionary.DataTypeDefinition;
|
import org.alfresco.service.cmr.dictionary.DataTypeDefinition;
|
||||||
import org.alfresco.service.cmr.dictionary.DictionaryService;
|
import org.alfresco.service.cmr.dictionary.DictionaryService;
|
||||||
import org.alfresco.service.cmr.dictionary.PropertyDefinition;
|
import org.alfresco.service.cmr.dictionary.PropertyDefinition;
|
||||||
@@ -2051,18 +2052,20 @@ abstract public class AbstractMappingMetadataExtracter implements MetadataExtrac
|
|||||||
return extractRaw(reader);
|
return extractRaw(reader);
|
||||||
}
|
}
|
||||||
FutureTask<Map<String, Serializable>> task = null;
|
FutureTask<Map<String, Serializable>> task = null;
|
||||||
|
StreamAwareContentReaderProxy proxiedReader = null;
|
||||||
try
|
try
|
||||||
{
|
{
|
||||||
task = new FutureTask<Map<String,Serializable>>(new ExtractRawCallable(reader));
|
proxiedReader = new StreamAwareContentReaderProxy(reader);
|
||||||
|
task = new FutureTask<Map<String,Serializable>>(new ExtractRawCallable(proxiedReader));
|
||||||
getExecutorService().execute(task);
|
getExecutorService().execute(task);
|
||||||
return task.get(limits.getTimeoutMs(), TimeUnit.MILLISECONDS);
|
return task.get(limits.getTimeoutMs(), TimeUnit.MILLISECONDS);
|
||||||
}
|
}
|
||||||
catch (TimeoutException e)
|
catch (TimeoutException e)
|
||||||
{
|
{
|
||||||
task.cancel(true);
|
task.cancel(true);
|
||||||
if (reader.isChannelOpen())
|
if (null != proxiedReader)
|
||||||
{
|
{
|
||||||
reader.getReadableChannel().close();
|
proxiedReader.release();
|
||||||
}
|
}
|
||||||
throw e;
|
throw e;
|
||||||
}
|
}
|
||||||
|
@@ -1,5 +1,5 @@
|
|||||||
/*
|
/*
|
||||||
* Copyright (C) 2005-2010 Alfresco Software Limited.
|
* Copyright (C) 2005-2014 Alfresco Software Limited.
|
||||||
*
|
*
|
||||||
* This file is part of Alfresco
|
* This file is part of Alfresco
|
||||||
*
|
*
|
||||||
@@ -19,12 +19,15 @@
|
|||||||
package org.alfresco.repo.content.metadata;
|
package org.alfresco.repo.content.metadata;
|
||||||
|
|
||||||
import java.util.ArrayList;
|
import java.util.ArrayList;
|
||||||
|
import java.util.Set;
|
||||||
|
|
||||||
import org.alfresco.repo.content.MimetypeMap;
|
import org.alfresco.repo.content.MimetypeMap;
|
||||||
import org.apache.commons.logging.Log;
|
import org.apache.commons.logging.Log;
|
||||||
import org.apache.commons.logging.LogFactory;
|
import org.apache.commons.logging.LogFactory;
|
||||||
|
import org.apache.poi.patch.AlfrescoPoiPatchUtils;
|
||||||
import org.apache.tika.parser.Parser;
|
import org.apache.tika.parser.Parser;
|
||||||
import org.apache.tika.parser.microsoft.ooxml.OOXMLParser;
|
import org.apache.tika.parser.microsoft.ooxml.OOXMLParser;
|
||||||
|
import org.springframework.beans.factory.InitializingBean;
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* POI-based metadata extractor for Office 07 documents.
|
* POI-based metadata extractor for Office 07 documents.
|
||||||
@@ -37,12 +40,19 @@ import org.apache.tika.parser.microsoft.ooxml.OOXMLParser;
|
|||||||
* <b>Any custom property:</b> -- [not mapped]
|
* <b>Any custom property:</b> -- [not mapped]
|
||||||
* </pre>
|
* </pre>
|
||||||
*
|
*
|
||||||
* Uses Apache Tika
|
* Uses Apache Tika<br />
|
||||||
|
* <br />
|
||||||
|
* Configures {@link AlfrescoPoiPatchUtils} to resolve the following issues:
|
||||||
|
* <ul>
|
||||||
|
* <li><a href="https://issues.alfresco.com/jira/browse/MNT-577">MNT-577</a></li>
|
||||||
|
* <li><a href="https://issues.alfresco.com/jira/browse/MNT-11823">MNT-11823</a></li>
|
||||||
|
* </ul>
|
||||||
*
|
*
|
||||||
* @author Nick Burch
|
* @author Nick Burch
|
||||||
* @author Neil McErlean
|
* @author Neil McErlean
|
||||||
|
* @author Dmitry Velichkevich
|
||||||
*/
|
*/
|
||||||
public class PoiMetadataExtracter extends TikaPoweredMetadataExtracter
|
public class PoiMetadataExtracter extends TikaPoweredMetadataExtracter implements InitializingBean
|
||||||
{
|
{
|
||||||
protected static Log logger = LogFactory.getLog(PoiMetadataExtracter.class);
|
protected static Log logger = LogFactory.getLog(PoiMetadataExtracter.class);
|
||||||
|
|
||||||
@@ -53,9 +63,15 @@ public class PoiMetadataExtracter extends TikaPoweredMetadataExtracter
|
|||||||
new OOXMLParser()
|
new OOXMLParser()
|
||||||
);
|
);
|
||||||
|
|
||||||
|
private Integer poiFootnotesLimit;
|
||||||
|
|
||||||
|
private Boolean poiExtractPropertiesOnly = false;
|
||||||
|
|
||||||
|
private Set<String> poiAllowableXslfRelationshipTypes;
|
||||||
|
|
||||||
public PoiMetadataExtracter()
|
public PoiMetadataExtracter()
|
||||||
{
|
{
|
||||||
super(SUPPORTED_MIMETYPES);
|
super(PoiMetadataExtracter.class.getName(), SUPPORTED_MIMETYPES);
|
||||||
}
|
}
|
||||||
|
|
||||||
@Override
|
@Override
|
||||||
@@ -63,4 +79,73 @@ public class PoiMetadataExtracter extends TikaPoweredMetadataExtracter
|
|||||||
{
|
{
|
||||||
return new OOXMLParser();
|
return new OOXMLParser();
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* MNT-577: Alfresco is running 100% CPU for over 10 minutes while extracting metadata for Word office document <br />
|
||||||
|
* <br />
|
||||||
|
*
|
||||||
|
* @param poiFootnotesLimit - {@link Integer} value which specifies limit of amount of footnotes of XWPF documents
|
||||||
|
*/
|
||||||
|
public void setPoiFootnotesLimit(Integer poiFootnotesLimit)
|
||||||
|
{
|
||||||
|
this.poiFootnotesLimit = poiFootnotesLimit;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* MNT-11823: Upload of PPTX causes very high memory usage leading to system instability<br />
|
||||||
|
* <br />
|
||||||
|
*
|
||||||
|
* @param poiExtractPropertiesOnly - {@link Boolean} value which indicates that POI extractor must avoid building of the full document parts hierarchy and reading content of
|
||||||
|
* the parts
|
||||||
|
*/
|
||||||
|
public void setPoiExtractPropertiesOnly(Boolean poiExtractPropertiesOnly)
|
||||||
|
{
|
||||||
|
this.poiExtractPropertiesOnly = poiExtractPropertiesOnly;
|
||||||
|
}
|
||||||
|
|
||||||
|
public Boolean isPoiExtractPropertiesOnly()
|
||||||
|
{
|
||||||
|
return (poiExtractPropertiesOnly == null) ? (false) : (poiExtractPropertiesOnly);
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* MNT-11823: Upload of PPTX causes very high memory usage leading to system instability<br />
|
||||||
|
* <br />
|
||||||
|
*
|
||||||
|
* @param poiAllowableXslfRelationshipTypes - {@link Set}<{@link String}> instance which determines the list of allowable relationship types for traversing during
|
||||||
|
* analyzing of XSLF document
|
||||||
|
*/
|
||||||
|
public void setPoiAllowableXslfRelationshipTypes(Set<String> poiAllowableXslfRelationshipTypes)
|
||||||
|
{
|
||||||
|
this.poiAllowableXslfRelationshipTypes = poiAllowableXslfRelationshipTypes;
|
||||||
|
}
|
||||||
|
|
||||||
|
public Set<String> getPoiAllowableXslfRelationshipTypes()
|
||||||
|
{
|
||||||
|
return poiAllowableXslfRelationshipTypes;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* MNT-11823: Upload of PPTX causes very high memory usage leading to system instability<br />
|
||||||
|
* <br />
|
||||||
|
* Initialization of {@link AlfrescoPoiPatchUtils} properties for {@link PoiMetadataExtracter#getExtractorContext()} context
|
||||||
|
*/
|
||||||
|
@Override
|
||||||
|
public void afterPropertiesSet() throws Exception
|
||||||
|
{
|
||||||
|
if (null == poiExtractPropertiesOnly)
|
||||||
|
{
|
||||||
|
poiExtractPropertiesOnly = false;
|
||||||
|
}
|
||||||
|
|
||||||
|
String context = getExtractorContext();
|
||||||
|
|
||||||
|
if (null != poiFootnotesLimit)
|
||||||
|
{
|
||||||
|
AlfrescoPoiPatchUtils.setPoiFootnotesLimit(context, poiFootnotesLimit);
|
||||||
|
}
|
||||||
|
|
||||||
|
AlfrescoPoiPatchUtils.setPoiExtractPropertiesOnly(context, poiExtractPropertiesOnly);
|
||||||
|
AlfrescoPoiPatchUtils.setPoiAllowableXslfRelationshipTypes(context, poiAllowableXslfRelationshipTypes);
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
@@ -1,5 +1,5 @@
|
|||||||
/*
|
/*
|
||||||
* Copyright (C) 2005-2010 Alfresco Software Limited.
|
* Copyright (C) 2005-2014 Alfresco Software Limited.
|
||||||
*
|
*
|
||||||
* This file is part of Alfresco
|
* This file is part of Alfresco
|
||||||
*
|
*
|
||||||
@@ -38,6 +38,7 @@ import org.alfresco.service.cmr.repository.datatype.DefaultTypeConverter;
|
|||||||
import org.alfresco.service.cmr.repository.datatype.TypeConversionException;
|
import org.alfresco.service.cmr.repository.datatype.TypeConversionException;
|
||||||
import org.apache.commons.logging.Log;
|
import org.apache.commons.logging.Log;
|
||||||
import org.apache.commons.logging.LogFactory;
|
import org.apache.commons.logging.LogFactory;
|
||||||
|
import org.apache.poi.patch.AlfrescoPoiPatchUtils;
|
||||||
import org.apache.tika.embedder.Embedder;
|
import org.apache.tika.embedder.Embedder;
|
||||||
import org.apache.tika.extractor.DocumentSelector;
|
import org.apache.tika.extractor.DocumentSelector;
|
||||||
import org.apache.tika.io.TemporaryResources;
|
import org.apache.tika.io.TemporaryResources;
|
||||||
@@ -97,6 +98,8 @@ public abstract class TikaPoweredMetadataExtracter
|
|||||||
private DateTimeFormatter tikaDateFormater;
|
private DateTimeFormatter tikaDateFormater;
|
||||||
protected DocumentSelector documentSelector;
|
protected DocumentSelector documentSelector;
|
||||||
|
|
||||||
|
private String extractorContext = null;
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Builds up a list of supported mime types by merging
|
* Builds up a list of supported mime types by merging
|
||||||
* an explicit list with any that Tika also claims to support
|
* an explicit list with any that Tika also claims to support
|
||||||
@@ -128,22 +131,37 @@ public abstract class TikaPoweredMetadataExtracter
|
|||||||
return types;
|
return types;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
public TikaPoweredMetadataExtracter(String extractorContext, ArrayList<String> supportedMimeTypes)
|
||||||
|
{
|
||||||
|
this(extractorContext, new HashSet<String>(supportedMimeTypes), null);
|
||||||
|
}
|
||||||
|
|
||||||
public TikaPoweredMetadataExtracter(ArrayList<String> supportedMimeTypes)
|
public TikaPoweredMetadataExtracter(ArrayList<String> supportedMimeTypes)
|
||||||
{
|
{
|
||||||
this(new HashSet<String>(supportedMimeTypes), null);
|
this(null, new HashSet<String>(supportedMimeTypes), null);
|
||||||
}
|
}
|
||||||
|
|
||||||
public TikaPoweredMetadataExtracter(ArrayList<String> supportedMimeTypes, ArrayList<String> supportedEmbedMimeTypes)
|
public TikaPoweredMetadataExtracter(ArrayList<String> supportedMimeTypes, ArrayList<String> supportedEmbedMimeTypes)
|
||||||
{
|
{
|
||||||
this(new HashSet<String>(supportedMimeTypes), new HashSet<String>(supportedEmbedMimeTypes));
|
this(null, new HashSet<String>(supportedMimeTypes), new HashSet<String>(supportedEmbedMimeTypes));
|
||||||
}
|
}
|
||||||
|
|
||||||
public TikaPoweredMetadataExtracter(HashSet<String> supportedMimeTypes)
|
public TikaPoweredMetadataExtracter(HashSet<String> supportedMimeTypes)
|
||||||
{
|
{
|
||||||
this(supportedMimeTypes, null);
|
this(null, supportedMimeTypes, null);
|
||||||
}
|
}
|
||||||
|
|
||||||
public TikaPoweredMetadataExtracter(HashSet<String> supportedMimeTypes, HashSet<String> supportedEmbedMimeTypes)
|
public TikaPoweredMetadataExtracter(HashSet<String> supportedMimeTypes, HashSet<String> supportedEmbedMimeTypes)
|
||||||
|
{
|
||||||
|
this(null, supportedMimeTypes, supportedEmbedMimeTypes);
|
||||||
|
}
|
||||||
|
|
||||||
|
public TikaPoweredMetadataExtracter(String extractorContext, HashSet<String> supportedMimeTypes, HashSet<String> supportedEmbedMimeTypes)
|
||||||
{
|
{
|
||||||
super(supportedMimeTypes, supportedEmbedMimeTypes);
|
super(supportedMimeTypes, supportedEmbedMimeTypes);
|
||||||
|
|
||||||
|
this.extractorContext = extractorContext;
|
||||||
|
|
||||||
// TODO Once TIKA-451 is fixed this list will get nicer
|
// TODO Once TIKA-451 is fixed this list will get nicer
|
||||||
DateTimeParser[] parsersUTC = {
|
DateTimeParser[] parsersUTC = {
|
||||||
DateTimeFormat.forPattern("yyyy-MM-dd'T'HH:mm:ss'Z'").getParser(),
|
DateTimeFormat.forPattern("yyyy-MM-dd'T'HH:mm:ss'Z'").getParser(),
|
||||||
@@ -161,6 +179,16 @@ public abstract class TikaPoweredMetadataExtracter
|
|||||||
this.tikaDateFormater = new DateTimeFormatterBuilder().append(null, parsers).toFormatter();
|
this.tikaDateFormater = new DateTimeFormatterBuilder().append(null, parsers).toFormatter();
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Gets context for the current implementation
|
||||||
|
*
|
||||||
|
* @return {@link String} value which determines current context
|
||||||
|
*/
|
||||||
|
protected String getExtractorContext()
|
||||||
|
{
|
||||||
|
return extractorContext;
|
||||||
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Version which also tries the ISO-8601 formats (in order..),
|
* Version which also tries the ISO-8601 formats (in order..),
|
||||||
* and similar formats, which Tika makes use of
|
* and similar formats, which Tika makes use of
|
||||||
@@ -316,6 +344,9 @@ public abstract class TikaPoweredMetadataExtracter
|
|||||||
Map<String, Serializable> rawProperties = newRawMap();
|
Map<String, Serializable> rawProperties = newRawMap();
|
||||||
|
|
||||||
InputStream is = null;
|
InputStream is = null;
|
||||||
|
|
||||||
|
// Parse using properties of the context of current implementation
|
||||||
|
boolean contextPresented = null != extractorContext;
|
||||||
try
|
try
|
||||||
{
|
{
|
||||||
is = getInputStream(reader);
|
is = getInputStream(reader);
|
||||||
@@ -340,6 +371,12 @@ public abstract class TikaPoweredMetadataExtracter
|
|||||||
handler = new NullContentHandler();
|
handler = new NullContentHandler();
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Set POI properties context if available...
|
||||||
|
if (contextPresented)
|
||||||
|
{
|
||||||
|
AlfrescoPoiPatchUtils.setContext(extractorContext);
|
||||||
|
}
|
||||||
|
|
||||||
parser.parse(is, handler, metadata, context);
|
parser.parse(is, handler, metadata, context);
|
||||||
|
|
||||||
// First up, copy all the Tika metadata over
|
// First up, copy all the Tika metadata over
|
||||||
@@ -399,6 +436,12 @@ public abstract class TikaPoweredMetadataExtracter
|
|||||||
}
|
}
|
||||||
finally
|
finally
|
||||||
{
|
{
|
||||||
|
// Reset POI properties context
|
||||||
|
if (contextPresented)
|
||||||
|
{
|
||||||
|
AlfrescoPoiPatchUtils.setContext(null);
|
||||||
|
}
|
||||||
|
|
||||||
if (is != null)
|
if (is != null)
|
||||||
{
|
{
|
||||||
try { is.close(); } catch (IOException e) {}
|
try { is.close(); } catch (IOException e) {}
|
||||||
|
@@ -1,5 +1,5 @@
|
|||||||
/*
|
/*
|
||||||
* Copyright (C) 2005-2013 Alfresco Software Limited.
|
* Copyright (C) 2005-2014 Alfresco Software Limited.
|
||||||
*
|
*
|
||||||
* This file is part of Alfresco
|
* This file is part of Alfresco
|
||||||
*
|
*
|
||||||
@@ -19,12 +19,24 @@
|
|||||||
package org.alfresco.repo.content.transform;
|
package org.alfresco.repo.content.transform;
|
||||||
|
|
||||||
import java.util.Map;
|
import java.util.Map;
|
||||||
|
import java.util.concurrent.Callable;
|
||||||
|
import java.util.concurrent.ExecutionException;
|
||||||
|
import java.util.concurrent.ExecutorService;
|
||||||
|
import java.util.concurrent.Executors;
|
||||||
|
import java.util.concurrent.Future;
|
||||||
|
import java.util.concurrent.TimeUnit;
|
||||||
|
import java.util.concurrent.TimeoutException;
|
||||||
|
|
||||||
import org.alfresco.error.AlfrescoRuntimeException;
|
import org.alfresco.error.AlfrescoRuntimeException;
|
||||||
|
import org.alfresco.repo.content.AbstractStreamAwareProxy;
|
||||||
|
import org.alfresco.repo.content.StreamAwareContentReaderProxy;
|
||||||
|
import org.alfresco.repo.content.StreamAwareContentWriterProxy;
|
||||||
|
import org.alfresco.repo.content.metadata.AbstractMappingMetadataExtracter;
|
||||||
import org.alfresco.service.cmr.repository.ContentIOException;
|
import org.alfresco.service.cmr.repository.ContentIOException;
|
||||||
import org.alfresco.service.cmr.repository.ContentReader;
|
import org.alfresco.service.cmr.repository.ContentReader;
|
||||||
import org.alfresco.service.cmr.repository.ContentServiceTransientException;
|
import org.alfresco.service.cmr.repository.ContentServiceTransientException;
|
||||||
import org.alfresco.service.cmr.repository.ContentWriter;
|
import org.alfresco.service.cmr.repository.ContentWriter;
|
||||||
|
import org.alfresco.service.cmr.repository.TransformationOptionLimits;
|
||||||
import org.alfresco.service.cmr.repository.TransformationOptions;
|
import org.alfresco.service.cmr.repository.TransformationOptions;
|
||||||
import org.apache.commons.logging.Log;
|
import org.apache.commons.logging.Log;
|
||||||
import org.apache.commons.logging.LogFactory;
|
import org.apache.commons.logging.LogFactory;
|
||||||
@@ -43,10 +55,26 @@ public abstract class AbstractContentTransformer2 extends AbstractContentTransfo
|
|||||||
{
|
{
|
||||||
private static final Log logger = LogFactory.getLog(AbstractContentTransformer2.class);
|
private static final Log logger = LogFactory.getLog(AbstractContentTransformer2.class);
|
||||||
|
|
||||||
|
private ExecutorService executorService;
|
||||||
|
|
||||||
private ContentTransformerRegistry registry;
|
private ContentTransformerRegistry registry;
|
||||||
private boolean registerTransformer;
|
private boolean registerTransformer;
|
||||||
private boolean retryTransformOnDifferentMimeType;
|
private boolean retryTransformOnDifferentMimeType;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* A flag that indicates that the transformer should be started in it own Thread so
|
||||||
|
* that it may be interrupted rather than using the timeout in the Reader.
|
||||||
|
* Need only be set for transformers that read their source data quickly but then
|
||||||
|
* take a long time to process the data (such as {@link PoiOOXMLContentTransformer}.
|
||||||
|
*/
|
||||||
|
private Boolean useTimeoutThread = false;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Extra time added the timeout when using a Thread for the transformation so that
|
||||||
|
* a timeout from the Reader has a chance to happen first.
|
||||||
|
*/
|
||||||
|
private long additionalThreadTimout = 2000;
|
||||||
|
|
||||||
private static ThreadLocal<Integer> depth = new ThreadLocal<Integer>()
|
private static ThreadLocal<Integer> depth = new ThreadLocal<Integer>()
|
||||||
{
|
{
|
||||||
@Override
|
@Override
|
||||||
@@ -209,7 +237,48 @@ public abstract class AbstractContentTransformer2 extends AbstractContentTransfo
|
|||||||
setReaderLimits(reader, writer, options);
|
setReaderLimits(reader, writer, options);
|
||||||
|
|
||||||
// Transform
|
// Transform
|
||||||
|
// MNT-12238: CLONE - CLONE - Upload of PPTX causes very high memory usage leading to system instability
|
||||||
|
// Limiting transformation up to configured amount of milliseconds to avoid very high RAM consumption
|
||||||
|
// and OOM during transforming problematic documents
|
||||||
|
TransformationOptionLimits limits = getLimits(reader.getMimetype(), writer.getMimetype(), options);
|
||||||
|
|
||||||
|
long timeoutMs = limits.getTimeoutMs();
|
||||||
|
if (!useTimeoutThread || (null == limits) || (-1 == timeoutMs))
|
||||||
|
{
|
||||||
transformInternal(reader, writer, options);
|
transformInternal(reader, writer, options);
|
||||||
|
}
|
||||||
|
else
|
||||||
|
{
|
||||||
|
Future<?> submittedTask = null;
|
||||||
|
StreamAwareContentReaderProxy proxiedReader = new StreamAwareContentReaderProxy(reader);
|
||||||
|
StreamAwareContentWriterProxy proxiedWriter = new StreamAwareContentWriterProxy(writer);
|
||||||
|
|
||||||
|
try
|
||||||
|
{
|
||||||
|
submittedTask = getExecutorService().submit(new TransformInternalCallable(proxiedReader, proxiedWriter, options));
|
||||||
|
submittedTask.get(timeoutMs + additionalThreadTimout, TimeUnit.MILLISECONDS);
|
||||||
|
}
|
||||||
|
catch (TimeoutException e)
|
||||||
|
{
|
||||||
|
releaseResources(submittedTask, proxiedReader, proxiedWriter);
|
||||||
|
throw new TimeoutException("Transformation failed due to timeout limit");
|
||||||
|
}
|
||||||
|
catch (InterruptedException e)
|
||||||
|
{
|
||||||
|
releaseResources(submittedTask, proxiedReader, proxiedWriter);
|
||||||
|
throw new InterruptedException("Transformation failed, because the thread of the transformation was interrupted");
|
||||||
|
}
|
||||||
|
catch (ExecutionException e)
|
||||||
|
{
|
||||||
|
Throwable cause = e.getCause();
|
||||||
|
if (cause instanceof TransformInternalCallableException)
|
||||||
|
{
|
||||||
|
cause = ((TransformInternalCallableException) cause).getCause();
|
||||||
|
}
|
||||||
|
|
||||||
|
throw cause;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
// record time
|
// record time
|
||||||
long after = System.currentTimeMillis();
|
long after = System.currentTimeMillis();
|
||||||
@@ -345,6 +414,31 @@ public abstract class AbstractContentTransformer2 extends AbstractContentTransfo
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Cancels <code>task</code> and closes content accessors
|
||||||
|
*
|
||||||
|
* @param task - {@link Future} task instance which specifies a transformation action
|
||||||
|
* @param proxiedReader - {@link AbstractStreamAwareProxy} instance which represents channel closing mechanism for content reader
|
||||||
|
* @param proxiedWriter - {@link AbstractStreamAwareProxy} instance which represents channel closing mechanism for content writer
|
||||||
|
*/
|
||||||
|
private void releaseResources(Future<?> task, AbstractStreamAwareProxy proxiedReader, AbstractStreamAwareProxy proxiedWriter)
|
||||||
|
{
|
||||||
|
if (null != task)
|
||||||
|
{
|
||||||
|
task.cancel(true);
|
||||||
|
}
|
||||||
|
|
||||||
|
if (null != proxiedReader)
|
||||||
|
{
|
||||||
|
proxiedReader.release();
|
||||||
|
}
|
||||||
|
|
||||||
|
if (null != proxiedWriter)
|
||||||
|
{
|
||||||
|
proxiedWriter.release();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
public final void transform(
|
public final void transform(
|
||||||
ContentReader reader,
|
ContentReader reader,
|
||||||
ContentWriter writer,
|
ContentWriter writer,
|
||||||
@@ -400,6 +494,103 @@ public abstract class AbstractContentTransformer2 extends AbstractContentTransfo
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Gets the <code>ExecutorService</code> to be used for timeout-aware extraction.
|
||||||
|
* <p>
|
||||||
|
* If no <code>ExecutorService</code> has been defined a default of <code>Executors.newCachedThreadPool()</code> is used during {@link AbstractMappingMetadataExtracter#init()}.
|
||||||
|
*
|
||||||
|
* @return the defined or default <code>ExecutorService</code>
|
||||||
|
*/
|
||||||
|
protected ExecutorService getExecutorService()
|
||||||
|
{
|
||||||
|
if (null == executorService)
|
||||||
|
{
|
||||||
|
executorService = Executors.newCachedThreadPool();
|
||||||
|
}
|
||||||
|
|
||||||
|
return executorService;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Sets the <code>ExecutorService</code> to be used for timeout-aware transformation.
|
||||||
|
*
|
||||||
|
* @param executorService - {@link ExecutorService} instance for timeouts
|
||||||
|
*/
|
||||||
|
public void setExecutorService(ExecutorService executorService)
|
||||||
|
{
|
||||||
|
this.executorService = executorService;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* {@link Callable} wrapper for the {@link AbstractContentTransformer2#transformInternal(ContentReader, ContentWriter, TransformationOptions)} method to handle timeouts.
|
||||||
|
*/
|
||||||
|
private class TransformInternalCallable implements Callable<Void>
|
||||||
|
{
|
||||||
|
private ContentReader reader;
|
||||||
|
|
||||||
|
private ContentWriter writer;
|
||||||
|
|
||||||
|
private TransformationOptions options;
|
||||||
|
|
||||||
|
public TransformInternalCallable(ContentReader reader, ContentWriter writer, TransformationOptions options)
|
||||||
|
{
|
||||||
|
this.reader = reader;
|
||||||
|
this.writer = writer;
|
||||||
|
this.options = options;
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public Void call() throws Exception
|
||||||
|
{
|
||||||
|
try
|
||||||
|
{
|
||||||
|
transformInternal(reader, writer, options);
|
||||||
|
return null;
|
||||||
|
}
|
||||||
|
catch (Throwable e)
|
||||||
|
{
|
||||||
|
throw new TransformInternalCallableException(e);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Exception wrapper to handle any {@link Throwable} from {@link AbstractContentTransformer2#transformInternal(ContentReader, ContentWriter, TransformationOptions)}
|
||||||
|
*/
|
||||||
|
private class TransformInternalCallableException extends Exception
|
||||||
|
{
|
||||||
|
private static final long serialVersionUID = 7740560508772740658L;
|
||||||
|
|
||||||
|
public TransformInternalCallableException(Throwable cause)
|
||||||
|
{
|
||||||
|
super(cause);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* @param useTimeoutThread - {@link Boolean} value which specifies timeout limiting mechanism for the current transformer
|
||||||
|
* @see AbstractContentTransformer2#useTimeoutThread
|
||||||
|
*/
|
||||||
|
public void setUseTimeoutThread(Boolean useTimeoutThread)
|
||||||
|
{
|
||||||
|
if (null == useTimeoutThread)
|
||||||
|
{
|
||||||
|
useTimeoutThread = true;
|
||||||
|
}
|
||||||
|
|
||||||
|
this.useTimeoutThread = useTimeoutThread;
|
||||||
|
}
|
||||||
|
|
||||||
|
public void setAdditionalThreadTimout(long additionalThreadTimout)
|
||||||
|
{
|
||||||
|
this.additionalThreadTimout = additionalThreadTimout;
|
||||||
|
}
|
||||||
|
|
||||||
|
public Boolean isTransformationLimitedInternally()
|
||||||
|
{
|
||||||
|
return useTimeoutThread;
|
||||||
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Records an error and updates the average time as if the transformation took a
|
* Records an error and updates the average time as if the transformation took a
|
||||||
* long time, so that it is less likely to be called again.
|
* long time, so that it is less likely to be called again.
|
||||||
|
@@ -1,5 +1,5 @@
|
|||||||
/*
|
/*
|
||||||
* Copyright (C) 2005-2010 Alfresco Software Limited.
|
* Copyright (C) 2005-2014 Alfresco Software Limited.
|
||||||
*
|
*
|
||||||
* This file is part of Alfresco
|
* This file is part of Alfresco
|
||||||
*
|
*
|
||||||
@@ -48,6 +48,7 @@ public class PoiOOXMLContentTransformer extends TikaPoweredContentTransformer
|
|||||||
|
|
||||||
public PoiOOXMLContentTransformer() {
|
public PoiOOXMLContentTransformer() {
|
||||||
super(SUPPORTED_MIMETYPES);
|
super(SUPPORTED_MIMETYPES);
|
||||||
|
setUseTimeoutThread(true);
|
||||||
}
|
}
|
||||||
|
|
||||||
@Override
|
@Override
|
||||||
|
@@ -1,5 +1,5 @@
|
|||||||
/*
|
/*
|
||||||
* Copyright (C) 2005-2010 Alfresco Software Limited.
|
* Copyright (C) 2005-2014 Alfresco Software Limited.
|
||||||
*
|
*
|
||||||
* This file is part of Alfresco
|
* This file is part of Alfresco
|
||||||
*
|
*
|
||||||
@@ -108,6 +108,7 @@ public class TikaAutoContentTransformer extends TikaPoweredContentTransformer
|
|||||||
public TikaAutoContentTransformer(TikaConfig tikaConfig)
|
public TikaAutoContentTransformer(TikaConfig tikaConfig)
|
||||||
{
|
{
|
||||||
super( buildMimeTypes(tikaConfig) );
|
super( buildMimeTypes(tikaConfig) );
|
||||||
|
setUseTimeoutThread(true);
|
||||||
}
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
|
@@ -1,5 +1,5 @@
|
|||||||
/*
|
/*
|
||||||
* Copyright (C) 2005-2012 Alfresco Software Limited.
|
* Copyright (C) 2005-2014 Alfresco Software Limited.
|
||||||
*
|
*
|
||||||
* This file is part of Alfresco
|
* This file is part of Alfresco
|
||||||
*
|
*
|
||||||
@@ -32,14 +32,12 @@ import javax.xml.transform.sax.TransformerHandler;
|
|||||||
import javax.xml.transform.stream.StreamResult;
|
import javax.xml.transform.stream.StreamResult;
|
||||||
|
|
||||||
import org.alfresco.repo.content.MimetypeMap;
|
import org.alfresco.repo.content.MimetypeMap;
|
||||||
import org.alfresco.repo.content.filestore.FileContentReader;
|
|
||||||
import org.alfresco.service.cmr.repository.ContentReader;
|
import org.alfresco.service.cmr.repository.ContentReader;
|
||||||
import org.alfresco.service.cmr.repository.ContentWriter;
|
import org.alfresco.service.cmr.repository.ContentWriter;
|
||||||
import org.alfresco.service.cmr.repository.TransformationOptions;
|
import org.alfresco.service.cmr.repository.TransformationOptions;
|
||||||
import org.apache.commons.logging.Log;
|
import org.apache.commons.logging.Log;
|
||||||
import org.apache.commons.logging.LogFactory;
|
import org.apache.commons.logging.LogFactory;
|
||||||
import org.apache.tika.extractor.DocumentSelector;
|
import org.apache.tika.extractor.DocumentSelector;
|
||||||
import org.apache.tika.io.TikaInputStream;
|
|
||||||
import org.apache.tika.metadata.Metadata;
|
import org.apache.tika.metadata.Metadata;
|
||||||
import org.apache.tika.parser.ParseContext;
|
import org.apache.tika.parser.ParseContext;
|
||||||
import org.apache.tika.parser.Parser;
|
import org.apache.tika.parser.Parser;
|
||||||
@@ -69,6 +67,14 @@ public abstract class TikaPoweredContentTransformer extends AbstractContentTrans
|
|||||||
MimetypeMap.MIMETYPE_XHTML,
|
MimetypeMap.MIMETYPE_XHTML,
|
||||||
MimetypeMap.MIMETYPE_XML});
|
MimetypeMap.MIMETYPE_XML});
|
||||||
|
|
||||||
|
private static final double MEGABYTES = 1024.0 * 1024.0;
|
||||||
|
|
||||||
|
private static final String USAGE_PATTERN = "Content transformation has completed:\n" +
|
||||||
|
" Transformer: %s\n" +
|
||||||
|
" Content Reader: %s\n" +
|
||||||
|
" Memory (MB): Used/Total/Maximum - %f/%f/%f\n" +
|
||||||
|
" Time Spent: %d ms";
|
||||||
|
|
||||||
protected List<String> sourceMimeTypes;
|
protected List<String> sourceMimeTypes;
|
||||||
protected DocumentSelector documentSelector;
|
protected DocumentSelector documentSelector;
|
||||||
|
|
||||||
@@ -225,22 +231,24 @@ public abstract class TikaPoweredContentTransformer extends AbstractContentTrans
|
|||||||
);
|
);
|
||||||
}
|
}
|
||||||
|
|
||||||
// Prefer the File if available - it takes less memory to process
|
InputStream is = reader.getContentInputStream();
|
||||||
InputStream is;
|
|
||||||
if(reader instanceof FileContentReader)
|
long startTime = 0;
|
||||||
|
try {
|
||||||
|
if (logger.isDebugEnabled())
|
||||||
{
|
{
|
||||||
is = TikaInputStream.get( ((FileContentReader)reader).getFile(), metadata );
|
startTime = System.currentTimeMillis();
|
||||||
}
|
|
||||||
else
|
|
||||||
{
|
|
||||||
is = reader.getContentInputStream();
|
|
||||||
}
|
}
|
||||||
|
|
||||||
try {
|
|
||||||
parser.parse(is, handler, metadata, context);
|
parser.parse(is, handler, metadata, context);
|
||||||
}
|
}
|
||||||
finally
|
finally
|
||||||
{
|
{
|
||||||
|
if(logger.isDebugEnabled())
|
||||||
|
{
|
||||||
|
logger.debug(calculateMemoryAndTimeUsage(reader, startTime));
|
||||||
|
}
|
||||||
|
|
||||||
if (is != null)
|
if (is != null)
|
||||||
{
|
{
|
||||||
try { is.close(); } catch (Throwable e) {}
|
try { is.close(); } catch (Throwable e) {}
|
||||||
@@ -255,4 +263,13 @@ public abstract class TikaPoweredContentTransformer extends AbstractContentTrans
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
private String calculateMemoryAndTimeUsage(ContentReader reader, long startTime)
|
||||||
|
{
|
||||||
|
long endTime = System.currentTimeMillis();
|
||||||
|
Runtime runtime = Runtime.getRuntime();
|
||||||
|
long totalMemory = runtime.totalMemory();
|
||||||
|
return String.format(USAGE_PATTERN, this.getClass().getName(), reader, (totalMemory - runtime.freeMemory()) / MEGABYTES, totalMemory / MEGABYTES, runtime.maxMemory()
|
||||||
|
/ MEGABYTES, (endTime - startTime));
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
@@ -1,5 +1,5 @@
|
|||||||
/*
|
/*
|
||||||
* Copyright (C) 2005-2010 Alfresco Software Limited.
|
* Copyright (C) 2005-2014 Alfresco Software Limited.
|
||||||
*
|
*
|
||||||
* This file is part of Alfresco
|
* This file is part of Alfresco
|
||||||
*
|
*
|
||||||
@@ -35,9 +35,30 @@ import org.alfresco.service.namespace.QName;
|
|||||||
* @see org.alfresco.repo.content.metadata.PoiMetadataExtracter
|
* @see org.alfresco.repo.content.metadata.PoiMetadataExtracter
|
||||||
*
|
*
|
||||||
* @author Neil McErlean
|
* @author Neil McErlean
|
||||||
|
* @author Dmitry Velichkevich
|
||||||
*/
|
*/
|
||||||
public class PoiMetadataExtracterTest extends AbstractMetadataExtracterTest
|
public class PoiMetadataExtracterTest extends AbstractMetadataExtracterTest
|
||||||
{
|
{
|
||||||
|
private static final int MINIMAL_EXPECTED_PROPERTIES_AMOUNT = 3;
|
||||||
|
|
||||||
|
private static final int IGNORABLE_TIMEOUT = -1;
|
||||||
|
|
||||||
|
// private static final int TIMEOUT_FOR_QUICK_EXTRACTION = 2000;
|
||||||
|
|
||||||
|
private static final int DEFAULT_FOOTNOTES_LIMIT = 50;
|
||||||
|
|
||||||
|
private static final int LARGE_FOOTNOTES_LIMIT = 25000;
|
||||||
|
|
||||||
|
|
||||||
|
private static final String ALL_MIMETYPES_FILTER = "*";
|
||||||
|
|
||||||
|
private static final String PROBLEM_FOOTNOTES_DOCUMENT_NAME = "problemFootnotes2.docx";
|
||||||
|
|
||||||
|
// private static final String PROBLEM_SLIDE_SHOW_DOCUMENT_NAME = "problemSlideShow.pptx";
|
||||||
|
|
||||||
|
private static final String EXTRACTOR_POI_BEAN_NAME = "extracter.Poi";
|
||||||
|
|
||||||
|
|
||||||
private PoiMetadataExtracter extracter;
|
private PoiMetadataExtracter extracter;
|
||||||
|
|
||||||
@Override
|
@Override
|
||||||
@@ -46,9 +67,31 @@ public class PoiMetadataExtracterTest extends AbstractMetadataExtracterTest
|
|||||||
super.setUp();
|
super.setUp();
|
||||||
extracter = new PoiMetadataExtracter();
|
extracter = new PoiMetadataExtracter();
|
||||||
extracter.setDictionaryService(dictionaryService);
|
extracter.setDictionaryService(dictionaryService);
|
||||||
|
resetPoiConfigurationToDefault();
|
||||||
extracter.register();
|
extracter.register();
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
protected void tearDown() throws Exception
|
||||||
|
{
|
||||||
|
resetPoiConfigurationToDefault();
|
||||||
|
super.tearDown();
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Resets POI library configuration to default. Sets allowable XSLF relationship types and footnotes limit as per 'extracter.Poi' bean configuration
|
||||||
|
*
|
||||||
|
* @throws Exception
|
||||||
|
*/
|
||||||
|
private void resetPoiConfigurationToDefault() throws Exception
|
||||||
|
{
|
||||||
|
PoiMetadataExtracter configuredExtractor = (PoiMetadataExtracter) ctx.getBean(EXTRACTOR_POI_BEAN_NAME);
|
||||||
|
extracter.setPoiExtractPropertiesOnly(true);
|
||||||
|
extracter.setPoiFootnotesLimit(DEFAULT_FOOTNOTES_LIMIT);
|
||||||
|
extracter.setPoiAllowableXslfRelationshipTypes(configuredExtractor.getPoiAllowableXslfRelationshipTypes());
|
||||||
|
extracter.afterPropertiesSet();
|
||||||
|
}
|
||||||
|
|
||||||
@Override
|
@Override
|
||||||
protected MetadataExtracter getExtracter()
|
protected MetadataExtracter getExtracter()
|
||||||
{
|
{
|
||||||
@@ -123,7 +166,7 @@ public class PoiMetadataExtracterTest extends AbstractMetadataExtracterTest
|
|||||||
limits.setTimeoutMs(timeoutMs);
|
limits.setTimeoutMs(timeoutMs);
|
||||||
HashMap<String, MetadataExtracterLimits> mimetypeLimits =
|
HashMap<String, MetadataExtracterLimits> mimetypeLimits =
|
||||||
new HashMap<String, MetadataExtracterLimits>(1);
|
new HashMap<String, MetadataExtracterLimits>(1);
|
||||||
mimetypeLimits.put("*", limits);
|
mimetypeLimits.put(ALL_MIMETYPES_FILTER, limits);
|
||||||
((PoiMetadataExtracter) getExtracter()).setMimetypeLimits(mimetypeLimits);
|
((PoiMetadataExtracter) getExtracter()).setMimetypeLimits(mimetypeLimits);
|
||||||
|
|
||||||
File sourceFile = AbstractContentTransformerTest.loadNamedQuickTestFile("problemFootnotes.docx");
|
File sourceFile = AbstractContentTransformerTest.loadNamedQuickTestFile("problemFootnotes.docx");
|
||||||
@@ -144,4 +187,100 @@ public class PoiMetadataExtracterTest extends AbstractMetadataExtracterTest
|
|||||||
extractionTime < (timeoutMs + 100)); // bit of wiggle room for logging, cleanup, etc.
|
extractionTime < (timeoutMs + 100)); // bit of wiggle room for logging, cleanup, etc.
|
||||||
assertFalse("Reader was not closed", sourceReader.isChannelOpen());
|
assertFalse("Reader was not closed", sourceReader.isChannelOpen());
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// /**
|
||||||
|
// * Test for MNT-11823: Upload of PPTX causes very high memory usage leading to system instability
|
||||||
|
// *
|
||||||
|
// * @throws Exception
|
||||||
|
// */
|
||||||
|
// public void testProblemSlideShow() throws Exception
|
||||||
|
// {
|
||||||
|
// PoiMetadataExtracter extractor = (PoiMetadataExtracter) getExtracter();
|
||||||
|
// configureExtractorLimits(extractor, ALL_MIMETYPES_FILTER, TIMEOUT_FOR_QUICK_EXTRACTION);
|
||||||
|
//
|
||||||
|
// File problemSlideShowFile = AbstractContentTransformerTest.loadNamedQuickTestFile(PROBLEM_SLIDE_SHOW_DOCUMENT_NAME);
|
||||||
|
// ContentReader sourceReader = new FileContentReader(problemSlideShowFile);
|
||||||
|
// sourceReader.setMimetype(MimetypeMap.MIMETYPE_OPENXML_PRESENTATION);
|
||||||
|
//
|
||||||
|
// Map<QName, Serializable> properties = new HashMap<QName, Serializable>();
|
||||||
|
// extractor.extract(sourceReader, properties);
|
||||||
|
//
|
||||||
|
// assertExtractedProperties(properties);
|
||||||
|
// assertFalse("Reader was not closed", sourceReader.isChannelOpen());
|
||||||
|
//
|
||||||
|
// extractor.setPoiExtractPropertiesOnly(false);
|
||||||
|
// extractor.afterPropertiesSet();
|
||||||
|
// properties = new HashMap<QName, Serializable>();
|
||||||
|
// extractor.extract(sourceReader, properties);
|
||||||
|
//
|
||||||
|
// assertFalse("Reader was not closed", sourceReader.isChannelOpen());
|
||||||
|
// assertTrue(("Extraction completed successfully but failure is expected! Invalid properties are: " + properties), (null == properties) || properties.isEmpty());
|
||||||
|
// }
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Configures timeout for given <code>extractor</code> and <code>mimetypeFilter</code>
|
||||||
|
*
|
||||||
|
* @param extractor - {@link PoiMetadataExtracter} instance
|
||||||
|
* @param mimetypeFilter - {@link String} value which specifies mimetype filter for which timeout should be applied
|
||||||
|
* @param timeout - {@link Long} value which specifies timeout for <code>mimetypeFilter</code>
|
||||||
|
*/
|
||||||
|
private void configureExtractorLimits(PoiMetadataExtracter extractor, String mimetypeFilter, long timeout)
|
||||||
|
{
|
||||||
|
MetadataExtracterLimits limits = new MetadataExtracterLimits();
|
||||||
|
limits.setTimeoutMs(timeout);
|
||||||
|
HashMap<String, MetadataExtracterLimits> mimetypeLimits = new HashMap<String, MetadataExtracterLimits>(1);
|
||||||
|
mimetypeLimits.put(mimetypeFilter, limits);
|
||||||
|
extractor.setMimetypeLimits(mimetypeLimits);
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Test for MNT-577: Alfresco is running 100% CPU for over 10 minutes while extracting metadata for Word office document
|
||||||
|
*
|
||||||
|
* @throws Exception
|
||||||
|
*/
|
||||||
|
public void testFootnotesLimitParameterUsing() throws Exception
|
||||||
|
{
|
||||||
|
PoiMetadataExtracter extractor = (PoiMetadataExtracter) getExtracter();
|
||||||
|
|
||||||
|
File sourceFile = AbstractContentTransformerTest.loadNamedQuickTestFile(PROBLEM_FOOTNOTES_DOCUMENT_NAME);
|
||||||
|
ContentReader sourceReader = new FileContentReader(sourceFile);
|
||||||
|
sourceReader.setMimetype(MimetypeMap.MIMETYPE_OPENXML_WORDPROCESSING);
|
||||||
|
|
||||||
|
Map<QName, Serializable> properties = new HashMap<QName, Serializable>();
|
||||||
|
long startTime = System.currentTimeMillis();
|
||||||
|
extractor.extract(sourceReader, properties);
|
||||||
|
long extractionTimeWithDefaultFootnotesLimit = System.currentTimeMillis() - startTime;
|
||||||
|
|
||||||
|
assertExtractedProperties(properties);
|
||||||
|
assertFalse("Reader was not closed", sourceReader.isChannelOpen());
|
||||||
|
|
||||||
|
// Just let the extractor do the job...
|
||||||
|
configureExtractorLimits(extractor, ALL_MIMETYPES_FILTER, IGNORABLE_TIMEOUT);
|
||||||
|
extractor.setPoiFootnotesLimit(LARGE_FOOTNOTES_LIMIT);
|
||||||
|
extractor.afterPropertiesSet();
|
||||||
|
properties = new HashMap<QName, Serializable>();
|
||||||
|
startTime = System.currentTimeMillis();
|
||||||
|
extractor.extract(sourceReader, properties);
|
||||||
|
long extractionTimeWithLargeFootnotesLimit = System.currentTimeMillis() - startTime;
|
||||||
|
|
||||||
|
assertExtractedProperties(properties);
|
||||||
|
assertTrue("The second metadata extraction operation must be longer!", extractionTimeWithLargeFootnotesLimit > extractionTimeWithDefaultFootnotesLimit);
|
||||||
|
assertFalse("Reader was not closed", sourceReader.isChannelOpen());
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Asserts extracted <code>properties</code>. At least {@link PoiMetadataExtracterTest#MINIMAL_EXPECTED_PROPERTIES_AMOUNT} properties are expected:
|
||||||
|
* {@link ContentModel#PROP_TITLE}, {@link ContentModel#PROP_AUTHOR} and {@link ContentModel#PROP_CREATED}
|
||||||
|
*
|
||||||
|
* @param properties - {@link Map}<{@link QName}, {@link Serializable}> instance which contains all extracted properties
|
||||||
|
*/
|
||||||
|
private void assertExtractedProperties(Map<QName, Serializable> properties)
|
||||||
|
{
|
||||||
|
assertNotNull("Properties were not extracted at all!", properties);
|
||||||
|
assertFalse("Extracted properties are empty!", properties.isEmpty());
|
||||||
|
assertTrue(("Expected 3 extracted properties but only " + properties.size() + " have been extracted!"), properties.size() >= MINIMAL_EXPECTED_PROPERTIES_AMOUNT);
|
||||||
|
assertTrue(("'" + ContentModel.PROP_TITLE + "' property is missing!"), properties.containsKey(ContentModel.PROP_TITLE));
|
||||||
|
assertTrue(("'" + ContentModel.PROP_AUTHOR + "' property is missing!"), properties.containsKey(ContentModel.PROP_AUTHOR));
|
||||||
|
assertTrue(("'" + ContentModel.PROP_CREATED + "' property is missing!"), properties.containsKey(ContentModel.PROP_CREATED));
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
@@ -1,5 +1,5 @@
|
|||||||
/*
|
/*
|
||||||
* Copyright (C) 2005-2011 Alfresco Software Limited.
|
* Copyright (C) 2005-2014 Alfresco Software Limited.
|
||||||
*
|
*
|
||||||
* This file is part of Alfresco
|
* This file is part of Alfresco
|
||||||
*
|
*
|
||||||
@@ -18,16 +18,40 @@
|
|||||||
*/
|
*/
|
||||||
package org.alfresco.repo.content.transform;
|
package org.alfresco.repo.content.transform;
|
||||||
|
|
||||||
|
import java.io.File;
|
||||||
|
import java.util.concurrent.TimeoutException;
|
||||||
|
|
||||||
import org.alfresco.repo.content.MimetypeMap;
|
import org.alfresco.repo.content.MimetypeMap;
|
||||||
|
import org.alfresco.repo.content.filestore.FileContentReader;
|
||||||
|
import org.alfresco.repo.security.authentication.AuthenticationUtil;
|
||||||
|
import org.alfresco.repo.security.authentication.AuthenticationUtil.RunAsWork;
|
||||||
|
import org.alfresco.service.cmr.repository.ContentIOException;
|
||||||
|
import org.alfresco.service.cmr.repository.ContentReader;
|
||||||
|
import org.alfresco.service.cmr.repository.ContentService;
|
||||||
|
import org.alfresco.service.cmr.repository.ContentWriter;
|
||||||
|
import org.alfresco.service.cmr.repository.TransformationOptionLimits;
|
||||||
import org.alfresco.service.cmr.repository.TransformationOptions;
|
import org.alfresco.service.cmr.repository.TransformationOptions;
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* @see org.alfresco.repo.content.transform.PoiOOXMLContentTransformer
|
* @see org.alfresco.repo.content.transform.PoiOOXMLContentTransformer
|
||||||
*
|
*
|
||||||
* @author Nick Burch
|
* @author Nick Burch
|
||||||
|
* @author Dmitry Velichkevich
|
||||||
*/
|
*/
|
||||||
public class PoiOOXMLContentTransformerTest extends AbstractContentTransformerTest
|
public class PoiOOXMLContentTransformerTest extends AbstractContentTransformerTest
|
||||||
{
|
{
|
||||||
|
private static final int SMALL_TIMEOUT = 50;
|
||||||
|
|
||||||
|
private static final int ADDITIONAL_PROCESSING_TIME = 1500;
|
||||||
|
|
||||||
|
|
||||||
|
private static final String ENCODING_UTF_8 = "UTF-8";
|
||||||
|
|
||||||
|
private static final String TEST_PPTX_FILE_NAME = "quickImg2.pptx";
|
||||||
|
|
||||||
|
|
||||||
|
private ContentService contentService;
|
||||||
|
|
||||||
private PoiOOXMLContentTransformer transformer;
|
private PoiOOXMLContentTransformer transformer;
|
||||||
|
|
||||||
@Override
|
@Override
|
||||||
@@ -39,6 +63,8 @@ public class PoiOOXMLContentTransformerTest extends AbstractContentTransformerTe
|
|||||||
transformer.setMimetypeService(mimetypeService);
|
transformer.setMimetypeService(mimetypeService);
|
||||||
transformer.setTransformerDebug(transformerDebug);
|
transformer.setTransformerDebug(transformerDebug);
|
||||||
transformer.setTransformerConfig(transformerConfig);
|
transformer.setTransformerConfig(transformerConfig);
|
||||||
|
|
||||||
|
contentService = serviceRegistry.getContentService();
|
||||||
}
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
@@ -66,4 +92,92 @@ public class PoiOOXMLContentTransformerTest extends AbstractContentTransformerTe
|
|||||||
assertTrue(transformer.isTransformable(MimetypeMap.MIMETYPE_OPENXML_SPREADSHEET, -1, MimetypeMap.MIMETYPE_HTML, new TransformationOptions()));
|
assertTrue(transformer.isTransformable(MimetypeMap.MIMETYPE_OPENXML_SPREADSHEET, -1, MimetypeMap.MIMETYPE_HTML, new TransformationOptions()));
|
||||||
assertTrue(transformer.isTransformable(MimetypeMap.MIMETYPE_OPENXML_SPREADSHEET, -1, MimetypeMap.MIMETYPE_XML, new TransformationOptions()));
|
assertTrue(transformer.isTransformable(MimetypeMap.MIMETYPE_OPENXML_SPREADSHEET, -1, MimetypeMap.MIMETYPE_XML, new TransformationOptions()));
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* MNT-12043: CLONE - Upload of PPTX causes very high memory usage leading to system instability
|
||||||
|
*
|
||||||
|
* @throws Exception
|
||||||
|
*/
|
||||||
|
public void testMnt12043() throws Exception
|
||||||
|
{
|
||||||
|
transformer.setMimetypeService(mimetypeService);
|
||||||
|
transformer.setAdditionalThreadTimout(0);
|
||||||
|
configureExtractorLimits(transformer, SMALL_TIMEOUT);
|
||||||
|
|
||||||
|
File sourceFile = AbstractContentTransformerTest.loadNamedQuickTestFile(TEST_PPTX_FILE_NAME);
|
||||||
|
ContentReader sourceReader = new FileContentReader(sourceFile)
|
||||||
|
{
|
||||||
|
@Override
|
||||||
|
public void setLimits(TransformationOptionLimits limits)
|
||||||
|
{
|
||||||
|
// Test without content reader input stream timeout limits
|
||||||
|
}
|
||||||
|
};
|
||||||
|
sourceReader.setMimetype(MimetypeMap.MIMETYPE_OPENXML_PRESENTATION);
|
||||||
|
|
||||||
|
ContentWriter tempWriter = AuthenticationUtil.runAs(new RunAsWork<ContentWriter>()
|
||||||
|
{
|
||||||
|
@Override
|
||||||
|
public ContentWriter doWork() throws Exception
|
||||||
|
{
|
||||||
|
ContentWriter result = contentService.getTempWriter();
|
||||||
|
result.setEncoding(ENCODING_UTF_8);
|
||||||
|
result.setMimetype(MimetypeMap.MIMETYPE_TEXT_PLAIN);
|
||||||
|
|
||||||
|
return result;
|
||||||
|
}
|
||||||
|
}, AuthenticationUtil.getAdminUserName());
|
||||||
|
|
||||||
|
long startTime = System.currentTimeMillis();
|
||||||
|
|
||||||
|
try
|
||||||
|
{
|
||||||
|
transformer.transform(sourceReader, tempWriter);
|
||||||
|
long transformationTime = System.currentTimeMillis() - startTime;
|
||||||
|
fail("Content transformation took " + transformationTime + " ms, but should have failed with a timeout at " + SMALL_TIMEOUT + " ms");
|
||||||
|
}
|
||||||
|
catch (ContentIOException e)
|
||||||
|
{
|
||||||
|
long transformationTime = System.currentTimeMillis() - startTime;
|
||||||
|
assertTrue((TimeoutException.class.getName() + " exception is expected as the cause of transformation failure"), e.getCause() instanceof TimeoutException);
|
||||||
|
// Not sure we can have the following assert as we may have introduced an intermittent test failure. Already seen a time of 1009ms
|
||||||
|
assertTrue(("Failed content transformation took " + transformationTime + " ms, but should have failed with a timeout at " + SMALL_TIMEOUT + " ms"),
|
||||||
|
transformationTime <= (SMALL_TIMEOUT + ADDITIONAL_PROCESSING_TIME));
|
||||||
|
}
|
||||||
|
|
||||||
|
assertFalse("Readable channel was not closed after transformation attempt!", sourceReader.isChannelOpen());
|
||||||
|
assertFalse("Writable channel was not closed after transformation attempt!", tempWriter.isChannelOpen());
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Configures timeout for given <code>transformer</code>
|
||||||
|
*
|
||||||
|
* @param extractor - {@link PoiOOXMLContentTransformer} instance
|
||||||
|
* @param timeout - {@link Long} value which specifies timeout for <code>transformer</code>
|
||||||
|
*/
|
||||||
|
private void configureExtractorLimits(PoiOOXMLContentTransformer transformer, final long timeout)
|
||||||
|
{
|
||||||
|
transformer.setTransformerConfig(new TransformerConfigImpl()
|
||||||
|
{
|
||||||
|
@Override
|
||||||
|
public TransformationOptionLimits getLimits(ContentTransformer transformer, String sourceMimetype, String targetMimetype, String use)
|
||||||
|
{
|
||||||
|
TransformationOptionLimits result = new TransformationOptionLimits();
|
||||||
|
result.setTimeoutMs(timeout);
|
||||||
|
return result;
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public TransformerStatistics getStatistics(ContentTransformer transformer, String sourceMimetype, String targetMimetype, boolean createNew)
|
||||||
|
{
|
||||||
|
return transformerConfig.getStatistics(transformer, sourceMimetype, targetMimetype, createNew);
|
||||||
|
}
|
||||||
|
|
||||||
|
@Override
|
||||||
|
public boolean isSupportedTransformation(ContentTransformer transformer, String sourceMimetype, String targetMimetype, TransformationOptions options)
|
||||||
|
{
|
||||||
|
return transformerConfig.isSupportedTransformation(transformer, sourceMimetype, targetMimetype, options);
|
||||||
|
}
|
||||||
|
});
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
BIN
source/test-resources/quick/problemFootnotes2.docx
Normal file
BIN
source/test-resources/quick/problemFootnotes2.docx
Normal file
Binary file not shown.
Reference in New Issue
Block a user