Dave Ward 2e62d4fb29 Merged DEV/ALAN/SITE_PERF to HEAD
30342: Dev branch for Site performance issues (including rework of AuthorityService.getAuthorities() to use a 'lazy' set and DM indexing rework)
   ALF-9899 Huge share site migration, add group to site and user access site related performance issue.
   ALF-9208 Performance issue, during load tests /share/page/user/user-sites is showing to be the most expensive.
   ALF-9692 Performance: General performance of Alfresco degrades when there are 1000s of sites present
   - ancestor-preloading
   - hasAuthority
   - huge site test
   30370: - Save changed to do with adding childAuthorityCache to AuthorityDAOImpl
   - Increase aspectsTransactionalCache size as it blows up
   30387: Experimental solution to 'cascading reindex' performance problem
   - Now only Lucene container documents for a single subtree are reprocessed on addition / removal of a secondary child association
   - No need to delete and re-evaluate ALL the paths to all the nodes in the subtree - just the paths within the subtree
   - Lucene deltas now store the IDs of ANCESTORs to mask out as well as documents to reindex
   - Merge handles deletion of these efficiently
   - Node service cycle checks changed from getPaths to recursive cycleCheck method
   - Adding a group to 60,000 sites might not require all paths to all sites to be re-evaluated on every change!
   30389: Missed files from last checkin
   30390: Optimizations / fixes to Alan's test!
   30393: Bug fix - wasn't adding new documents into the index!
   30397: Fixed a problem with bulk loading trying to bulk load zero parent associations
   Also tweaked reindex calls
   30399: Correction - don't cascade below containers during path cascading
   30400: Another optimization - no need to trigger node bulk loading during path cascading - pass false for the preload flag
   30404: Further optimizations
   - On creation of a secondary child association, make a decision on whether it is cheaper to cascade reindex the parent or the child, based on the number of parent associations to the child
     - Assumes that if there are more than 5 parent associations, it's cheaper to cascade reindex the parent
     - Add a new authority to a zone (containing 60,000 authorities) - cascade reindex the authority, not the zone
     - Add a group (in 60,000 sites) to a site - cascade reindex the site, not the group
   - Caching of child associations already traversed during cascade reindexing
   - Site creation time much reduced!
   30407: Logic fix: Use 'delete only nodes' behaviour on DM index filtering and merging, now we are managing container deletions separately
   30408: Small correction related to last change.
   30409: Correction to deletion reindex behaviour (no need to regenerate masked out containers)
   - Site CRUD operations now all sub-second with 60,000 sites!
   30410: Stop the heartbeat from trying to load and count all site groups
   - Too expensive, as we might have 60,000 sites, each with 4 groups
   - Now just counts the groups in the default zone (the UI visible ones)
   30411: Increased lucene parameters to allow for 'path explosion'
   - 9 million lucene documents in my index after creating 60,000 Share sites (most of them probably paths) resulting in sluggish index write performance
   - Set lucene.indexer.mergerTargetIndexCount=8 (142 documents in smallest index)
   - Increased lucene.indexer.maxDocsForInMemoryMerge, lucene.indexer.maxDocsForInMemoryIndex
   30412: Test fixes
   30413: Revert 'parent association batch loading' changes (as it was a bad idea and is no longer necessary!)
   - Retain a few caching bug fixes however
   30416: Moved UserAuthoritySet (lazy load authority set) from PermissionServiceImpl to AuthorityServiceImpl
   30418: - Remove 'new' hasAuthority from authorityService so it is back to where we started.
   - SiteServiceHugeTest minor changes
   30421: Prevent creation of a duplicate root node on updating the root
   - Use the ANCESTOR field rather than ISCONTAINER to detect a node document, as the root node is both a container and a node!
   30447: Pulled new indexing behaviour into ADMLuceneIndexerImpl and restored old behaviour to AVMLuceneIndexerImpl to restore normal AVM behaviour
   30448: - Cache in PermissionServiceImpl cleared if an authority container has an association added or removed
     Supports the generateKey method which includes the username
     Supports changes in group structures
   - Moved logic to do with ROLE_GUEST from PermissionServiceImpl to AuthorityServiceImpl 
   30465: - Tidy up tests in SiteServiceTestHuge 
   30532: - Added getContainingAuthoritiesInZone to AuthorityService
     - Dave Changed PeopleService.getContainerGroups to only return groups in the DEFAULT zone
   - Fixed RM code to use getAuthoritiesForUser method with just the username again.
   30558: Build fixes
   - Fixed cycleCheck to throw a CyclicChildRelationshipException
   - More tidy up of AVM / ADM indexer split
   - Properly control when path generation is cascaded (not required on a full reindex or a tracker transaction)
   - Support indexing of a 'fake root' parent. Ouch my head hurts!
   30588: Build fixes
   - StringIndexOutOfBoundsException in NodeMonitor
   - Corrections to 'node only' delete behaviour
   - Use the PATH field to detect non-leaf nodes (it's the only stored field with which we can recognize the root)
   - Moved DOD5015Test.testVitalRecords() to the end - the only way I could work out how to get the full TestCase to run
   30600: More build fixes
   - Broadcast ALL node deletions to indexer (even those from cascade deletion of primary associations)
     - Allows indexer to wipe out all affected documents from the delta even if some have already been flushed under different parents by an intricate DOD unit test!
   - Pause FTS in DOD5015Test to prevent intermittent test failures (FTS can temporarily leave deleted documents in the index until it catches up)
   - More tidy up of ADMLuceneIndexerImpl
     - flushPending optimized and some unnecessary member variables removed
     - correction to cascade deletion behaviour (leave behind containers of unaffected secondary references)
     - unused MOVE action removed
     - further legacy logic moved into AVMLuceneIndexerImpl
   30620: More build fixes
   - Cope with a node morphing from a 'leaf' to a container during its lifetime
   - Container documents now created lazily in index as and when necessary
   - Blank out 'nth sibling' field of synthesized paths
   - ADMLuceneTest now passes!
   - TaggingServiceImplTest also passes - more special treatment for categories
   30627: Multi tenancy fixes
   30629: Possible build fix - retrying transaction in ReplicationServiceIntegrationTest.tearDown()
   30632: Build fix - lazy container generation after a move
   30636: Build fix: authority comparisons are case sensitive, even when that authority corresponds to a user (PermissionServiceTest.testPermissionCase())
   30638: Run SiteServiceTestHuge form a cmd line
      set SITE_CPATH=%TOMCAT_HOME%/lib/*;%TOMCAT_HOME%/endorsed/*;%TOMCAT_HOME%/webapps/alfresco/WEB-INF/lib/*;\
                     %TOMCAT_HOME%/webapps/alfresco/WEB-INF/classes;%TOMCAT_HOME%/shared/classes;
      java -Xmx2048m -XX:MaxPermSize=512M -classpath %SITE_CPATH% org.alfresco.repo.site.SiteServiceTestHuge ...
   
      Usage: -Daction=usersOnly
             -Dfrom=<fromSiteId> -Dto=<toSiteId>
             -Dfrom=<fromSiteId> -Dto=<toSiteId> -Daction=sites  -Drestart=<restartAtSiteId>
             -Dfrom=<fromSiteId> -Dto=<toSiteId> -Daction=groups -Drestart=<restartAtSiteId>
   30639: Minor changes to commented out command line code for SiteServiceTestHuge
   30643: Round of improvements to MySites dashlet relating to huge DB testing:
    - 10,000 site database, user is a member of ~2000 sites
    - Improvements to site.lib.ftl and related SiteService methods
    - To return MySites dashlet for the user, order of magnitude improvement from 7562ms to 618ms in the profiler (now ~350ms in the browser)
   30644: Fixed performance regression - too much opening and closing of the delta reader and writer
   30661: More reader opening / closing
   30668: Performance improvements to Site Finder and My Sites in user profile page.
    - faster to bring back lists and site memberships (used by the Site Finder)
    - related further improvements to APIs used by this and My Sites on dashboard
   30713: Configuration for MySites dashlet maximum list size
   30725: Merged V3.4-BUG-FIX to DEV/ALAN/SITE_PERF
      30708: ALF-10040: Added missing ReferenceCountingReadOnlyIndexReaderFactory wrapper to IndexInfo.getMainIndexReferenceCountingReadOnlyIndexReader() to make it consistent with IndexInfo.getMainIndexReferenceCountingReadOnlyIndexReader(String, Set<String>, boolean) and allow SingleFieldSelectors to make it through from LeafScorer to the path caches! Affects ALL Lucene queries that run OUTSIDE of a transaction.
   30729: Use getAuthoritiesForUser rather than getContainingAuthorities if possible.
   SiteServiceTestHuge: command line version
   30733: Performance improves to user dashboard relating to User Calendar 
    - converted web-tier calendar dashlet to Ajax client-side rendering - faster user experience and also less load on the web-tier
    - improvements to query from Andy
    - maximum sites/list size to query now configurable (default 100 instead of previously 1000)
   30743: Restore site CRUD performance from cold caches
   - Introduced NodeService.getAllRootNodes(), returning all nodes in a store with the root aspect, backed by a transactional cache and invalidated at key points
   - Means indexing doesn't have to load all parent nodes just to check for 'fake roots'
   - Site CRUD performance now back to sub-second with 60,000 nodes
   30747: Improvement to previous checkin - prevent cross cluster invalidation of every store root when a single store drops out of the cache
   30748: User dashboard finally loading within seconds with 60,000 sites, 60 groups, 100 users (thanks mostly to Kev's UI changes)
   - post-process IBatis mapped statements with MySQL dialect to apply fetchSize=Integer.MIN_VALUE to all _Limited statements
      - Means we can stream first 10,000 site groups without the MySQL JDBC driver reading all 240,000 into memory
   - New NodeService getChildAssocs method with a maxResults argument (makes use of the above)
   - Perfected getContainingAuthoritiesInZone implementation, adding a cutoff parameter, allowing only the first 1000 site memberships to be returned quickly and caches to be warmed for ACL evaluations
   - New cache of first 10,000 groups in APP.SHARE zone
   - Cache sizes tuned for 60,000 site scenario
   - Site service warms caches on bootstrap
   - PreferencesService applies ASPECT_IGNORE_INHERITED_RULES to person node to prevent the rule service trying to crawl the group hierarchy on a preference save
   - WorkflowServiceImpl.getPooledTasks only looks in APP.DEFAULT zone (thus avoiding site group noise)
   30749: Fix compilation errors
   30761: Minor change to SiteServiceTestHuge
   30762: Derek code review: Reworked fetchSize specification for select_ChildAssocsOfParent_Limited statement for MySQL
   - Now fetchSize stated explicitly in a MySQL specific config file resolved by the HierarchicalResourceLoader
   - No need for any Java-based post processing
   30763: Build fix: don't add a user into its own authorities (until specifically asked to)
   30767: Build fix
   - IBatis / MySQL needs a streaming result statement to be run in an isolation transaction (because it doesn't release PreparedStatements until the end)
   30771: Backed out previous change which was fundamentally flawed
   - Resolved underlying problem which was that the select_ChildAssocsOfParent_Limited SQL string needs to be unique in order to not cause confusion in the prepared statement cache
   30772: Backed out previous change which was fundamentally flawed
   - Resolved underlying problem which was that the select_ChildAssocsOfParent_Limited SQL string needs to be unique in order to not cause confusion in the prepared statement cache


git-svn-id: https://svn.alfresco.com/repos/alfresco-enterprise/alfresco/HEAD/root@30797 c4b6b30b-aa2e-2d43-bbcb-ca4b014f7261
2011-09-27 12:24:57 +00:00

388 lines
14 KiB
Java

/*
* Copyright (C) 2005-2010 Alfresco Software Limited.
*
* This file is part of Alfresco
*
* Alfresco is free software: you can redistribute it and/or modify
* it under the terms of the GNU Lesser General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Alfresco is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU Lesser General Public License for more details.
*
* You should have received a copy of the GNU Lesser General Public License
* along with Alfresco. If not, see <http://www.gnu.org/licenses/>.
*/
package org.alfresco.email.server;
import java.util.Map;
import org.alfresco.email.server.handler.EmailMessageHandler;
import org.alfresco.error.AlfrescoRuntimeException;
import org.alfresco.model.ContentModel;
import org.alfresco.repo.node.integrity.IntegrityException;
import org.alfresco.repo.security.authentication.AuthenticationUtil;
import org.alfresco.repo.security.authentication.AuthenticationUtil.RunAsWork;
import org.alfresco.repo.security.permissions.AccessDeniedException;
import org.alfresco.repo.transaction.RetryingTransactionHelper;
import org.alfresco.repo.transaction.RetryingTransactionHelper.RetryingTransactionCallback;
import org.alfresco.service.cmr.email.EmailMessage;
import org.alfresco.service.cmr.email.EmailMessageException;
import org.alfresco.service.cmr.email.EmailService;
import org.alfresco.service.cmr.repository.NodeRef;
import org.alfresco.service.cmr.repository.NodeService;
import org.alfresco.service.cmr.repository.StoreRef;
import org.alfresco.service.cmr.repository.datatype.DefaultTypeConverter;
import org.alfresco.service.cmr.search.ResultSet;
import org.alfresco.service.cmr.search.SearchService;
import org.alfresco.service.cmr.security.AuthorityService;
import org.alfresco.service.cmr.security.AuthorityType;
import org.alfresco.service.namespace.NamespaceService;
import org.alfresco.service.namespace.QName;
import org.springframework.extensions.surf.util.ParameterCheck;
/**
* Concrete email service implementation. This is responsible for routing the
* emails into the server.
*
* @since 2.2
*/
public class EmailServiceImpl implements EmailService
{
private static final String ERR_INBOUND_EMAIL_DISABLED = "email.server.err.inbound_mail_disabled";
private static final String ERR_INVALID_SUBJECT = "email.server.err.invalid_subject";
private static final String ERR_ACCESS_DENIED = "email.server.err.access_denied";
private static final String ERR_UNKNOWN_SOURCE_ADDRESS = "email.server.err.unknown_source_address";
private static final String ERR_USER_NOT_EMAIL_CONTRIBUTOR = "email.server.err.user_not_email_contributor";
private static final String ERR_INVALID_NODE_ADDRESS = "email.server.err.invalid_node_address";
private static final String ERR_HANDLER_NOT_FOUND = "email.server.err.handler_not_found";
private NamespaceService namespaceService;
private NodeService nodeService;
private SearchService searchService;
private RetryingTransactionHelper retryingTransactionHelper;
private AuthorityService authorityService;
private boolean emailInboundEnabled;
/** Login of user that is set as unknown. */
private String unknownUser;
/** List of message handlers */
private Map<String, EmailMessageHandler> emailMessageHandlerMap;
/**
*
* @param namespaceService the service to resolve namespace prefixes
*/
public void setNamespaceService(NamespaceService namespaceService)
{
this.namespaceService = namespaceService;
}
/**
* @param nodeService Alfresco Node Service
*/
public void setNodeService(NodeService nodeService)
{
this.nodeService = nodeService;
}
/**
* @param searchService Alfresco Search Service
*/
public void setSearchService(SearchService searchService)
{
this.searchService = searchService;
}
/**
* @param retryingTransactionHelper Alfresco RetryingTransactionHelper
*/
public void setRetryingTransactionHelper(RetryingTransactionHelper retryingTransactionHelper)
{
this.retryingTransactionHelper = retryingTransactionHelper;
}
/**
* @param authorityService Alfresco authority service
*/
public void setAuthorityService(AuthorityService authorityService)
{
this.authorityService = authorityService;
}
/**
* @return Map of message handlers
*/
public Map<String, EmailMessageHandler> getEmailMessageHandlerMap()
{
return emailMessageHandlerMap;
}
/**
* @param emailMessageHandlerMap Map of message handlers
*/
public void setEmailMessageHandlerMap(Map<String, EmailMessageHandler> emailMessageHandlerMap)
{
this.emailMessageHandlerMap = emailMessageHandlerMap;
}
/**
* @param unknownUser Login of user that should be set as unknown.
*/
public void setUnknownUser(String unknownUser)
{
this.unknownUser = unknownUser;
}
public void setEmailInboundEnabled(boolean mailInboundEnabled)
{
this.emailInboundEnabled = mailInboundEnabled;
}
/**
* {@inheritDoc}
*/
public void importMessage(EmailMessage message)
{
processMessage(null, message);
}
/**
* {@inheritDoc}
*/
public void importMessage(NodeRef nodeRef, EmailMessage message)
{
processMessage(nodeRef, message);
}
/**
* Process the message. Method is called after filtering by sender's address.
*
* @param nodeRef Addressed node (target node).
* @param message Email message
* @throws EmailMessageException Any exception occured inside the method will be converted and thrown as <code>EmailMessageException</code>
*/
private void processMessage(final NodeRef nodeRef, final EmailMessage message)
{
if (!emailInboundEnabled)
{
throw new EmailMessageException(ERR_INBOUND_EMAIL_DISABLED);
}
try
{
// Get the username for the process using the system account
final RetryingTransactionCallback<String> getUsernameCallback = new RetryingTransactionCallback<String>()
{
public String execute() throws Throwable
{
String from = message.getFrom();
return getUsername(from);
}
};
RunAsWork<String> getUsernameRunAsWork = new RunAsWork<String>()
{
public String doWork() throws Exception
{
return retryingTransactionHelper.doInTransaction(getUsernameCallback, false);
}
};
String username = AuthenticationUtil.runAs(getUsernameRunAsWork, AuthenticationUtil.SYSTEM_USER_NAME);
// Process the message using the username's account
final RetryingTransactionCallback<Object> processMessageCallback = new RetryingTransactionCallback<Object>()
{
public Object execute() throws Throwable
{
String recipient = message.getTo();
NodeRef targetNodeRef = null;
if (nodeRef == null)
{
targetNodeRef = getTargetNode(recipient);
}
else
{
targetNodeRef = nodeRef;
}
EmailMessageHandler messageHandler = getMessageHandler(targetNodeRef);
messageHandler.processMessage(targetNodeRef, message);
return null;
}
};
RunAsWork<Object> processMessageRunAsWork = new RunAsWork<Object>()
{
public Object doWork() throws Exception
{
return retryingTransactionHelper.doInTransaction(processMessageCallback, false);
}
};
AuthenticationUtil.runAs(processMessageRunAsWork, username);
}
catch (EmailMessageException e)
{
// These are email-specific errors
throw e;
}
catch (AccessDeniedException e)
{
throw new EmailMessageException(ERR_ACCESS_DENIED, message.getFrom(), message.getTo());
}
catch (IntegrityException e)
{
throw new EmailMessageException(ERR_INVALID_SUBJECT);
}
catch (Throwable e)
{
throw new AlfrescoRuntimeException("Email message processing failed", e);
}
}
/**
* @param nodeRef Target node
* @return Handler that can process message addressed to specific node (target node).
* @throws EmailMessageException is thrown if a suitable message handler isn't found.
*/
private EmailMessageHandler getMessageHandler(NodeRef nodeRef)
{
ParameterCheck.mandatory("nodeRef", nodeRef);
QName nodeTypeQName = nodeService.getType(nodeRef);
String prefixedNodeTypeStr = nodeTypeQName.toPrefixString(namespaceService);
EmailMessageHandler handler = emailMessageHandlerMap.get(prefixedNodeTypeStr);
if (handler == null)
{
throw new EmailMessageException(ERR_HANDLER_NOT_FOUND, prefixedNodeTypeStr);
}
return handler;
}
/**
* Method determines target node by recipient e-mail address.
*
* @param recipient An e-mail address of a recipient
* @return Reference to the target node
* @throws EmailMessageException is thrown if the target node couldn't be determined by some reasons.
*/
private NodeRef getTargetNode(String recipient)
{
if (recipient == null || recipient.length() == 0)
{
throw new EmailMessageException(ERR_INVALID_NODE_ADDRESS, recipient);
}
String[] parts = recipient.split("@");
if (parts.length != 2)
{
throw new EmailMessageException(ERR_INVALID_NODE_ADDRESS, recipient);
}
// Ok, address looks well, let's try to find related alias
StoreRef storeRef = new StoreRef(StoreRef.PROTOCOL_WORKSPACE, "SpacesStore");
String query = String.format(AliasableAspect.SEARCH_TEMPLATE, parts[0]);
ResultSet resultSet = searchService.query(storeRef, SearchService.LANGUAGE_LUCENE, query);
try
{
// Sometimes result contains trash. For example if we look for node with alias='target' after searching,
// we will get all nodes wich contain word 'target' in them alias property.
for (int i = 0; i < resultSet.length(); i++)
{
NodeRef resRef = resultSet.getNodeRef(i);
String alias = (String)nodeService.getProperty(resRef, EmailServerModel.PROP_ALIAS);
if (parts[0].equalsIgnoreCase(alias))
{
return resRef;
}
}
}
finally
{
resultSet.close();
}
// Ok, alias wasn't found, let's try to interpret recipient address as 'node-bdid' value
query = "@sys\\:node-dbid:" + parts[0];
try
{
resultSet = searchService.query(storeRef, SearchService.LANGUAGE_LUCENE, query);
if (resultSet.length() > 0)
{
return resultSet.getNodeRef(0);
}
}
finally
{
resultSet.close();
}
throw new EmailMessageException(ERR_INVALID_NODE_ADDRESS, recipient);
}
/**
* Authenticate in Alfresco repository by sender's e-mail address.
*
* @param from Sender's email address
* @return User name
* @throws EmailMessageException Exception will be thrown if authentication is failed.
*/
private String getUsername(String from)
{
String userName = null;
StoreRef storeRef = new StoreRef(StoreRef.PROTOCOL_WORKSPACE, "SpacesStore");
String query = "TYPE:cm\\:person +@cm\\:email:\"" + from + "\"";
ResultSet resultSet = searchService.query(storeRef, SearchService.LANGUAGE_LUCENE, query);
try
{
if (resultSet.length() == 0)
{
if (unknownUser == null || unknownUser.length() == 0)
{
throw new EmailMessageException(ERR_UNKNOWN_SOURCE_ADDRESS, from);
}
else
{
userName = unknownUser;
}
}
else
{
NodeRef userNode = resultSet.getNodeRef(0);
if (nodeService.exists(userNode))
{
userName = DefaultTypeConverter.INSTANCE.convert(
String.class,
nodeService.getProperty(userNode, ContentModel.PROP_USERNAME));
}
else
{
// The Lucene index returned a dead result
throw new EmailMessageException(ERR_UNKNOWN_SOURCE_ADDRESS, from);
}
}
}
finally
{
resultSet.close();
}
// Ensure that the user is part of the Email Contributors group
if (userName == null || !isEmailContributeUser(userName))
{
throw new EmailMessageException(ERR_USER_NOT_EMAIL_CONTRIBUTOR, userName);
}
return userName;
}
/**
* Check that the user is the member in <b>EMAIL_CONTRIBUTORS</b> group
*
* @param userName User name
* @return True if the user is member of the group
* @exception EmailMessageException Exception will be thrown if the <b>EMAIL_CONTRIBUTORS</b> group isn't found
*/
private boolean isEmailContributeUser(String userName)
{
return this.authorityService.getAuthoritiesForUser(userName).contains(
authorityService.getName(AuthorityType.GROUP, "EMAIL_CONTRIBUTORS"));
}
}