+ * Perhaps someday, it might be worthwhile to create a specific + * parser for each registered scheme-specific part, and validate + * that; for now, we'll just be be more lax, and assume the URI + * is alwasy scheme-qualified. This matcher will look no further + * than the leading colon, and declare "no match" otherwise. + * The discussion below explains why. + *
+ * See: http://tools.ietf.org/html/rfc3986): + *
+ * The following regex parses URIs: + * ^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))? + * + * Given the following URI: + * http://www.ics.uci.edu/pub/ietf/uri/#Related + * + * The captured subexpressions are: + * + * $1 = http: + * $2 = http + * $3 = //www.ics.uci.edu + * $4 = www.ics.uci.edu + * $5 = /pub/ietf/uri/ + * $6 =+ * $7 = + * $8 = #Related + * $9 = Related + * + * N0TE: + * A URI can be non-scheme qualified because $1 is optional. Therefore, + * the following are all exaples of valid non-scheme qualified URIS: + * + * "" + * "moo@cow.com" + * "moo@cow.com?wow" + * "moo@cow.com?wow#zow" + * "moo@cow.com#zow" + * "/" + * "/moo/cow" + * "/moo/cow?wow" + * "/moo/cow?wow#zow" + * "/moo/cow#zow" + * "//moo/cow" + * "//moo.com/cow" + * "//moo.com/cow/" + * "//moo.com/cow?wow" + * "//moo.com/cow?wow#zow" + * "//moo.com/cow#zow" + * "//moo.com:8080/cow" + * "//moo.com:/cow" + * "//moo.com:8080/cow?wow" + * "//moo.com:8080/cow?wow#zow" + * "//moo.com:8080/cow#zow" + * "///moo/cow" + * "///moo/cow?wow" + * "///moo/cow?wow#zow" + * "///moo/cow#zow" + * + * And so forth... + * + * + * + * Thus the business end of things as far as scheme matching is: $2, + * Most schemes will have a $3 that starts with '//', but not all. + * Specificially, the following have no "network path '//' segment, + * or aren't required to (source: http://en.wikipedia.org/wiki/URI_scheme): + *+ * + * cid data dns fax go h323 iax2 mailto mid news pres sip + * sips tel urn xmpp about aim callto feed magnet msnim + * psyc skype sms stream xfire ymsgr + * + *+ * + * Visually the parts are as follows: + *+ * + * foo://example.com:10042/over/there?name=ferret#nose + * \_/ \_______________/\_________/ \_________/ \__/ + * | | | | | + * scheme authority path query fragment + * | _____________________|__ + * / \ / \ + * urn:example:animal:ferret:nose + * + *+ * + * This is useful for classifying URLs for things like whether or not + * they're supported by an application. + * + * For example, the LinkValidationService supports http, and https, + * is willing to ignore certain well-formed URLs, but treats URLs + * will unknown and unsupported protocols as broken. Concretely, + * we'd like to avoid treating something like the following one + * as being non-broken even though you can't apply GET or HEAD + * to it. + * + *+ * Email + *+ * + * As of June 2007,IANA had over 70 registered and provisional protocols + * listed at http://www.iana.org/assignments/uri-schemes.html but sometimes + * people create their own too (e.g.: cvs). Here's the official list: + *+ * + * aaa aaas acap afs cap cid crid data dav dict dns dtn fax file + * ftp go gopher h323 http https iax2 icap im imap info ipp iris + * iris.beep iris.lwz iris.xpc iris.xpcs ldap mailserver mailto + * mid modem msrp msrps mtqp mupdate news nfs nntp opaquelocktoken + * pop pres prospero rtsp service shttp sip sips snmp soap.beep + * soap.beeps tag tel telnet tftp thismessage tip tn3270 tv urn + * vemmi wais xmlrpc.beep xmlrpc.beeps xmpp z39.50r z39.50s + *+ * + */ +public class UriSchemeNameMatcher implements NameMatcher, Serializable +{ + /** + * The extensions to match. + */ + HashMapscheme_; + + /** + * Default constructor. + */ + public UriSchemeNameMatcher() + { + scheme_ = new HashMap (); + } + + /** + * Set the protocols case insensitively (cannonicalized to lower-case). + * + * @param protocols + */ + public void setExtensions(List protocols) + { + for (String protocol : protocols) + { + scheme_.put( protocol.toLowerCase(), null ); + } + } + + /** + * Returns true if the URL's protocol is in the of + * being matched. Everything up to but not including + * the intial colon is + */ + public boolean matches(String uri) + { + if ( uri == null ) { return false; } + + int colon_index = uri.indexOf(':'); + + if ( colon_index >= 0) + { + String proto = + uri.substring(0, colon_index).toLowerCase(); + + return scheme_.containsKey( proto ); + } + return false; + } +} +