alfresco-community-repo/source/java/org/alfresco/util/SearchLanguageConversion.java
David Caruana 575c970565 Merging BRANCHES/DEV/CMIS_10 to HEAD:
17717: This check-in contains changes in Java and .NET TCK tests related to CMIS-43  and CMIS-44 JIRA tasks. Also some bugs were faced out and fixed in 
   17727: CMIS-69: Alfresco to CMIS ACL mapping: Part 1: API
   17732: Merge HEAD to DEV/CMIS10
   17756: MOB-563: SQL Tests - Lexer
   17764: CMIS-69: Alfresco to CMIS ACL mapping: get ACL support
   17802: More for CMIS-69: Alfresco to CMIS ACL mapping. Implementation for applyAcl.
   17830: Fixes for CMIS lexer and parser tests
   17838: Access fix ups for access by the WS/Rest layers
   17869: 1) remote-api:
   17874: SAIL-146: Alfresco to CMIS ACL mapping: Support to group ACEs by principal id
   17883: Adjust version properties for dev/cmis10 branch.
   17885: Update OASIS CMIS TC status.
   17889: Fix issue where objectid is not rendered correctly for CMIS private working copies.
   17890: SAIL-146: Alfresco to CMIS ACL mapping: Fixes for ACL merging when reporting and ordering of ACEs. Report full permissions and not unique short names.
   17902: Fix issue where CMIS queries via GET used incorrect defaults for paging.
   17909: Fix CMIS link relations for folder tree.
   17912: Fix CMIS type descendants atompub link
   17922: Update AtomPub binding to CMIS 1.0 CD05 XSDs.
   17924: SAIL-146: Alfresco to CMIS ACL mapping: Test set using full permissions (as opposed to short unique names)
   17927: Fix content stream create/update status to comply with CMIS 1.0 CD05.
   17934: Resolve encoding issues in CMIS AtomPub binding.
   17973: SAIL-171: CMIS Renditions REST binding
   17975: SAIL-146: Alfresco to CMIS ACL mapping: Completed AllowedAction and Permissions mapping. Added missing canDeleteTree.
   17990: Update CMIS AtomPub to CD06
   17996: Updates for cmis.alfresco.com for CD06 in prep for public review 2.
   18007: WS-Bindings were updated with CMIS 1.0 cd06 changes.
   18016: CMIS web services: Add missing generated files from WSDL
   18018: CMIS index page updates for cmis.alfresco.com
   18041: Merged HEAD to DEV/CMIS_10
   18059: SAIL-227:
   18067: SAIL-157: Strict vs Non-Strict Query Language: Enforce restrictions on the use of SCORE() and CONTAINS()
   18080: Fix for SAIL-213:Bug: Query engine does not check that select list properties are valid for selectors
   18131: SAIL-156: Query Language Compliance: Fix support for LIKE, including escaping of '%' and '_' with '\'.
   18132: SAIL-156: Query Language Compliance: Fix support for LIKE, including escaping of '%' and '_' with '\': Fix underlying lucene impl for prefix and fuzzy queries to match wildcard/like
   18143: SAIL-156: Query Language Compliance: Fix and check qualifiers in IN_TREE and IN_FOLDER. Improved scoring for CONTAINS()
   18173: SAIL-245: Exclude thumbnails from normal query results
   18179: SAIL 214: Query Language Compliance: Check for valid object ids in IN_FOLDER and IN_TREE
   18210: SAIL-156:  Query Language Compliance: Support for simple column aliases in predicates/function arguments/embedded FTS. Check property/selector binding in embedded FTS.
   18211: SAIL-156:  Query Language Compliance: Support for simple column aliases in predicates/function arguments/embedded FTS. Check property/selector binding in embedded FTS.
   18215: SAIL 156: Query Language Compliance: Fix CMIS type info to reflect the underlying settings of the Alfresco type for includeInSuperTypeQuery
   18244: SAIL 156: Query Language Compliance: includeInSuperTypeQuery -> includedInSuperTypeQuery: First cut of cmis query test model. Fixed modelSchema.xml to validate
   18255: SAIL 156: Query Language Compliance: First set of tests for predicates using properties mapped to CMIS Strings.
   18261: CMIS-49 SAIL-163: Alfresco to CMIS Change Log mapping - New CMIS Audit mapping is implemented. ChangeLogDataExtractor was added.
   18263: Build Fix
   18285: SAIL 156: Query Language Compliance: Restrictions on predicates that may be used by single-valued and multi-valued properties
   18287: SAIL-186: Changes to make CMIS Rendition REST bindings pass new TCK tests
   18291: Fix Eclipse classpath problems
   18323: CMIS-44 SAIL-187: Change Log tests (WS) – Java and .NET tests for change log were implemented.
   18325: SAIL 156: Query Language Compliance: Fixes and tests for d:mltext mappings
   18329: Updated Chemistry TCK jar including Dave W's rendition tests.
   18333: Fix compile error - spurious imports.
   18334: Fix issue where absurl web script method failed when deployed to root context.
   18339: Update CMIS index page for start of public review 2.
   18387: SAIL-147: CMIS ACL REST bindings + framework fixes
   18392: Fix typo
   18394: SAIL 156: Query Language Compliance: Fixes and tests for d:<numeric>
   18406: SAIL 156: Query Language Compliance: Remaining type/predicate combinations. Restriction of In/Comparisons for ID/Boolean
   18408: CMIS Query language - remove (pointless) multi-valued column from language definition
   18409: Formatting change for CMIS.g
   18410: Formatting change for FTS.g
   18411: CMIS TCK tests were updated to CMIS 1.0 cd06 schemas.
   18412: SAIL 156: Query Language Compliance: Tests and fixes for aliases for all data types in simple predicates (they behave as the direct column reference)
   18417: Update Chemistry TCK which now incorporates Dave W's ACL tests.
   18419: Update CMIS index page to include public review end date.
   18427: SAIL 156: Query Language Compliance: Expose multi-valued properties in queries. Tests for all accessors. Fix content length to be long.
   18435: SAIL 156: Query Language Compliance: Use queryable correctly and fix up model mappings. Add tests for baseTypeId, contentStreamId and path.
   18472: SAIL 156: Query Language Compliance: Tests and fixes for FTS/Contains expressions. Adhere strictly to the spec - no extensions available by default. Improved FTS error reporting (and stop any recovery).
   18477: SAIL-164: CMIS change log REST bindings
   18495: SAIL 156: Query Language Compliance: Tests and fixes for escaping in string literals, LIKE and FTS expressions.
   18537: SAIL 156: Query Language Compliance: Sorting support. Basic sort test for all orderable/indexed CMIS properties.
   18538: SAIL-164: CMIS change log fixes for TCK compliance
   18547: SAIL 156: Query Language Compliance: Ordering tests for all datatypes, including null values. 
   18582: Incorporate latest Chemistry TCK
   18583: Update list of supported CMIS capabilities in index page.
   18606: SAIL-156, SAIL-157, SAIL-158: Query Language Compliance: Respect all query options including locale. Fixes and tests for MLText cross language support.
   18608: SAIL-159: Java / Javascript API access to CMIS Query Language
   18617: SAIL-158: Query Tests: Check policy and relationship types are not queryable.
   18636: SAIL-184: ACL tests (WS) 
   18663: ACL tests were updated in accordance with last requirements by David Caruana.
   18680: Update to CMIS CD07
   18681: Fix CMIS ContentStreamId property when document has no content.
   18700: CMIS: Head merge problem resolution.

Phase 1: Merge up to and including revision 18700, as this the point where both AtomPub and Web Services TCK tests succeed completely on dev branch.

Note: includes CMIS rendition support ready for integration and testing with DM renditions.

git-svn-id: https://svn.alfresco.com/repos/alfresco-enterprise/alfresco/HEAD/root@18790 c4b6b30b-aa2e-2d43-bbcb-ca4b014f7261
2010-02-23 17:23:42 +00:00

330 lines
10 KiB
Java

/*
* Copyright (C) 2005-2007 Alfresco Software Limited.
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License
* as published by the Free Software Foundation; either version 2
* of the License, or (at your option) any later version.
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
* As a special exception to the terms and conditions of version 2.0 of
* the GPL, you may redistribute this Program in connection with Free/Libre
* and Open Source Software ("FLOSS") applications as described in Alfresco's
* FLOSS exception. You should have recieved a copy of the text describing
* the FLOSS exception, and it is also available here:
* http://www.alfresco.com/legal/licensing"
*/
package org.alfresco.util;
import org.alfresco.repo.search.impl.lucene.LuceneQueryParser;
import org.apache.lucene.queryParser.QueryParser;
/**
* Helper class to provide conversions between different search languages
*
* @author Derek Hulley
*/
public class SearchLanguageConversion
{
/**
* SQL like query language summary:
* <ul>
* <li>Escape: \</li>
* <li>Single char search: _</li>
* <li>Multiple char search: %</li>
* <li>Reserved: \%_</li>
* </ul>
*/
public static LanguageDefinition DEF_SQL_LIKE = new SimpleLanguageDef('\\', "%", "_", "\\%_[]");
/**
* XPath like query language summary:
* <ul>
* <li>Escape: \</li>
* <li>Single char search: _</li>
* <li>Multiple char search: %</li>
* <li>Reserved: \%_</li>
* </ul>
*/
public static LanguageDefinition DEF_XPATH_LIKE = new SimpleLanguageDef('\\', "%", "_", "\\%_[]");
/**
* Regular expression query language summary:
* <ul>
* <li>Escape: \</li>
* <li>Single char search: .</li>
* <li>Multiple char search: .*</li>
* <li>Reserved: \*.+?^$(){}|</li>
* </ul>
*/
public static LanguageDefinition DEF_REGEX = new SimpleLanguageDef('\\', ".*", ".", "\\*.+?^$(){}|");
/**
* Lucene syntax summary: {@link QueryParser#escape(String) Lucene Query Parser}
*/
public static LanguageDefinition DEF_LUCENE = new LuceneLanguageDef(true);
public static LanguageDefinition DEF_LUCENE_INTERNAL = new LuceneLanguageDef(false);
/**
* CIFS name patch query language summary:
* <ul>
* <li>Escape: \ (but not used)</li>
* <li>Single char search: ?</li>
* <li>Multiple char search: *</li>
* <li>Reserved: "*\<>?/:|£%&+;</li>
* </ul>
*/
public static LanguageDefinition DEF_CIFS = new SimpleLanguageDef('\\', "*", "?", "\"*\\<>?/:|£%&+;");
/**
* Escape a string according to the <b>XPath</b> like function syntax.
*
* @param str
* the string to escape
* @return Returns the escaped string
*/
public static String escapeForXPathLike(String str)
{
return escape(DEF_XPATH_LIKE, str);
}
/**
* Escape a string according to the <b>regex</b> language syntax.
*
* @param str
* the string to escape
* @return Returns the escaped string
*/
public static String escapeForRegex(String str)
{
return escape(DEF_REGEX, str);
}
/**
* Escape a string according to the <b>Lucene</b> query syntax.
*
* @param str
* the string to escape
* @return Returns the escaped string
*/
public static String escapeForLucene(String str)
{
return escape(DEF_LUCENE, str);
}
/**
* Generic escaping using the language definition
*/
private static String escape(LanguageDefinition def, String str)
{
StringBuilder sb = new StringBuilder(str.length() * 2);
char[] chars = str.toCharArray();
for (int i = 0; i < chars.length; i++)
{
// first check for reserved chars
if (def.isReserved(chars[i]))
{
// escape it
sb.append(def.escapeChar);
}
sb.append(chars[i]);
}
return sb.toString();
}
/**
* Convert an <b>xpath</b> like function clause into a <b>regex</b> query.
*
* @param xpathLikeClause
* @return Returns a valid regular expression that is equivalent to the given <b>xpath</b> like clause.
*/
public static String convertXPathLikeToRegex(String xpathLikeClause)
{
return "(?s)" + convert(DEF_XPATH_LIKE, DEF_REGEX, xpathLikeClause);
}
/**
* Convert an <b>xpath</b> like function clause into a <b>Lucene</b> query.
*
* @param xpathLikeClause
* @return Returns a valid <b>Lucene</b> expression that is equivalent to the given <b>xpath</b> like clause.
*/
public static String convertXPathLikeToLucene(String xpathLikeClause)
{
return convert(DEF_XPATH_LIKE, DEF_LUCENE, xpathLikeClause);
}
/**
* Convert a <b>sql</b> like function clause into a <b>Lucene</b> query.
*
* @param sqlLikeClause
* @return Returns a valid <b>Lucene</b> expression that is equivalent to the given <b>sql</b> like clause.
*/
public static String convertSQLLikeToLucene(String sqlLikeClause)
{
return convert(DEF_SQL_LIKE, DEF_LUCENE_INTERNAL, sqlLikeClause);
}
/**
* Convert a <b>sql</b> like function clause into a <b>regex</b> query.
*
* @param sqlLikeClause
* @return Returns a valid regular expression that is equivalent to the given <b>sql</b> like clause.
*/
public static String convertSQLLikeToRegex(String sqlLikeClause)
{
return "(?s)" + convert(DEF_SQL_LIKE, DEF_REGEX, sqlLikeClause);
}
/**
* Convert a <b>CIFS</b> name path into the equivalent <b>Lucene</b> query.
*
* @param cifsNamePath
* the CIFS named path
* @return Returns a valid <b>Lucene</b> expression that is equivalent to the given CIFS name path
*/
public static String convertCifsToLucene(String cifsNamePath)
{
return convert(DEF_CIFS, DEF_LUCENE, cifsNamePath);
}
public static String convert(LanguageDefinition from, LanguageDefinition to, String query)
{
char[] chars = query.toCharArray();
StringBuilder sb = new StringBuilder(chars.length * 2);
boolean escaping = false;
for (int i = 0; i < chars.length; i++)
{
if (escaping) // if we are currently escaping, just escape the current character
{
if(to.isReserved(chars[i]))
{
sb.append(to.escapeChar); // the to format escape char
}
sb.append(chars[i]); // the current char
escaping = false;
}
else if (chars[i] == from.escapeChar) // not escaping and have escape char
{
escaping = true;
}
else if (query.startsWith(from.multiCharWildcard, i)) // not escaping but have multi-char wildcard
{
// translate the wildcard
sb.append(to.multiCharWildcard);
}
else if (query.startsWith(from.singleCharWildcard, i)) // have single-char wildcard
{
// translate the wildcard
sb.append(to.singleCharWildcard);
}
else if (to.isReserved(chars[i])) // reserved character
{
sb.append(to.escapeChar).append(chars[i]);
}
else
// just a normal char in both
{
sb.append(chars[i]);
}
}
return sb.toString();
}
/**
* Simple store of special characters for a given query language
*/
public static abstract class LanguageDefinition
{
public final char escapeChar;
public final String multiCharWildcard;
public final String singleCharWildcard;
public LanguageDefinition(char escapeChar, String multiCharWildcard, String singleCharWildcard)
{
this.escapeChar = escapeChar;
this.multiCharWildcard = multiCharWildcard;
this.singleCharWildcard = singleCharWildcard;
}
public abstract boolean isReserved(char ch);
}
private static class SimpleLanguageDef extends LanguageDefinition
{
private String reserved;
public SimpleLanguageDef(char escapeChar, String multiCharWildcard, String singleCharWildcard, String reserved)
{
super(escapeChar, multiCharWildcard, singleCharWildcard);
this.reserved = reserved;
}
@Override
public boolean isReserved(char ch)
{
return (reserved.indexOf(ch) > -1);
}
}
private static class LuceneLanguageDef extends LanguageDefinition
{
private String reserved;
public LuceneLanguageDef(boolean reserve)
{
super('\\', "*", "?");
if (reserve)
{
init();
}
else
{
reserved = "";
}
}
/**
* Discovers all the reserved chars
*/
private void init()
{
StringBuilder sb = new StringBuilder(20);
for (char ch = 0; ch < 256; ch++)
{
char[] chars = new char[] { ch };
String unescaped = new String(chars);
// check it
String escaped = LuceneQueryParser.escape(unescaped);
if (!escaped.equals(unescaped))
{
// it was escaped
sb.append(ch);
}
}
reserved = sb.toString();
}
@Override
public boolean isReserved(char ch)
{
return (reserved.indexOf(ch) > -1);
}
}
}