using error handlers to tidy MediaWiki URLs

Navigation

 * customization: shortening MediaWiki URLs: using error handlers

Overview
These methods set up the Apache web server so that MediaWiki pages are loaded in response to requests for pages which the web server sees as unavailable. Custom code extracts the page name from the request information, and calls MediaWiki functions to display the requested page.

I don't know of any particular circumstances where this would work and the recommended mod_rewrite method would not; this documentation is now considered obsolete, but is being kept here in case it is useful.

Potential problem: use wget to make sure that pages are not actually returning an error code (404 or 403), which apparently prevents some browsers (e.g. MSIE) from viewing them under some circumstances, and will certainly adversely affect search-engine listings.

Using a 404 Handler
''At this point, I can't recommend this method unless nothing else works. If you try this method, please report back on how it goes. --Woozle 18:46, 29 June 2006 (EDT)''

This uses the 404 (missing page) redirect mechanism – a standard /index.php/ request is handled by the standard code (in index.php), but any other URL which doesn't correspond to an existing page (within the wiki or not) is handled by a modified index.php. For any given "nonexistent" URL of the form " http://yourdomain.com/nonexistent/page ", the code returns a wiki page entitled "Nonexistent/page", with the "nonexistent" URL displayed as the URL for that page.

There is also a feature wherein you can create a page called Mediawiki:your/url/here and it will redirect to an article whose title is the contents of that page. For example: http://wiki.vbz.net/Currentevents is redirected to vbzwiki:Current events because the page vbzwiki:MediaWiki:Currentevents contains the text "Current events". (This is a bit of a trivial trick, as you can accomplish almost the same thing using #redirect pages.)

First, in the main .htaccess file (or in httpd.conf if you prefer), assign a location to handle 404 errors such that a PHP file will be loaded -- either of these will do, for example: ErrorDocument 404 /errors/404/ ErrorDocument 404 /wiki404error.php In the first instance, your modified index.php file would go in /errors/404/; in the second, it would be renamed wiki4040error.php and go in the same folder as the normal index.php.

The remaining instructions depend on which MediaWiki version you are using.

Version 1.4
''These instructions were made from changes that actually worked, but I may have left out some steps. I was more careful when I did the changes for version 1.5, so if these don't work check the version 1.5 instructions for anything missing.''

Second: Make the changes indicated in the 404-handling copy of index.php: if ( '' == $title && 'delete' != $action ) { # title not passed in parameter; use REQUEST_URI from environment $title = rawurldecode(ltrim($_SERVER['REQUEST_URI'], " /")); # see if there's a page designated for this URI $wgTitle = Title::newFromText( wfMsgForContent( $title ) ); if ('' == $wgTitle) { $wgTitle = Title::newFromText( $title ); } } elseif ( $curid = $wgRequest->getInt( 'curid' ) ) {
 * 1) 2005-06-19 Woozle mods for "missing" page
 * 1) end Woozle mods

/* redirect to canonical url, make it a 301 to allow caching */ $wgOut->setSquidMaxage( 1200 ); $wgArticle = new Article( $wgTitle ); $wgArticle->view; } else if ( Namespace::getSpecial == $wgTitle->getNamespace ) {
 * 1) 2005-06-21 Woozle mods to allow 404 page to summon wiki page without redirecting
 * 2) 	$wgOut->redirect( $wgTitle->getFullURL, '301');
 * 1)  	$mainText = $wgOut->parse( $wgArticle->getContent( false ) );
 * 2) 	echo $mainText;
 * 1) end Woozle mods

Version 1.5
Second: In the same folder as the modified index.php, create a LocalSettings.php with the following contents: 

Note: "../../LocalSettings.php" works if your modified index.php is buried two folders deep from your main index.php (as in the /errors/404/index.php example); adjust it as needed to point to your main LocalSettings.php.

Third: Make the changes indicated in the 404-handling copy of index.php:

require_once( './Defines.php' );
 * First change - need to point to the copied Defines.php:
 * 1) 2005-10-25 Woozle - for 404 handling


 * Second change - this is optional, but it cleans up the file a lot:
 * 1) 2005-10-25 Woozle - config code removed because it will never be executed
 * 2) if( !file_exists( 'LocalSettings.php' ) ) {
 * }
 * }

$raw_uri = rawurldecode(ltrim($_SERVER['REQUEST_URI'], " /")); $arr_uri = explode('?',$raw_uri); $title = $arr_uri[0]; $uri_qry= $arr_uri[1]; parse_str($uri_qry,$_REQUEST); $action = $wgRequest->getVal( 'action', 'view' ); $title_force = $wgRequest->getVal( 'title' ); if ('' != $title_force) { $title = $title_force; }
 * Third change - this pulls in the title-request from the error URI:
 * 1) Query string fields
 * 2) 2005-10-25 Woozle - 404 support - parameters have to be parsed from $_SERVER instead of $_REQUEST
 * 1) 2005-10-25 END

if ( '' == $title && 'delete' != $action ) { $wgTitle = Title::newFromText( wfMsgForContent( 'mainpage' ) ); if ('' == $wgTitle) { $wgTitle = Title::newFromText( $title ); }
 * Fourth change - optional and untested - allow title redirection
 * 1) 2005-10-26 Woozle - 404 support - optional redirect based on "mediawiki:articlename"
 * 1) 2005-10-26 END

#{ } else if ( NS_SPECIAL == $wgTitle->getNamespace ) {
 * Fifth change - I'm actually not sure if this is necessary, but don't have time to test uncommenting it:
 * 1) 2005-10-25 Woozle - for 404 handling - block out redirection code
 * 2) was -- if ((action is explicitly "view") AND (title is not passed as param) OR (title is not in canonical form) AND ??
 * 3) } else if ( ( $action == 'view' ) && 	(!isset( $_GET['title'] ) || $wgTitle->getPrefixedDBKey != $_GET['title'] ) && !count( array_diff( array_keys( $_GET ), array( 'action', 'title' ) ) ) )
 * 1) 	/* redirect to canonical url, make it a 301 to allow caching */
 * 2) 	$wgOut->setSquidMaxage( 1200 );
 * 3) 	$wgOut->redirect( $wgTitle->getFullURL, '301');

Finally
Finally, put the modified index.php where it will be the page used to handle 404 errors.


 * Caveats:
 * Your arbitrary URL will have its first character capitalized before it is displayed as the page's title or used to load another page (if you have set up a Mediawiki: page for it), although the URL shown will remain unchanged
 * There is probably a lot of excess index.php code which can be stripped out, as it will never be executed in this context
 * URLs ending in slashes appear to be a problem for some namespaces; the wiki code appears to be reading the URL from some place other than the modified code. (This doesn't seem to be a problem for version 1.5.)
 * All wiki links on the loaded page will point back to canonical wiki URLs, e.g. http://htyp.org/wiki/index.php/Main_Page; to change this. see "Shortening the links" below. 2006-02-16 This has been fixed.
 * Image thumbnail links don't work. 2006-02-16 This has been fixed.
 * 2006-02-16 Notes:
 * I changed the procedure a bit and page viewing now seems to work consistently, but I've only checked a few pages for proper behavior. Please test thoroughly before using on a production page, and let me know what you find.
 * Editing does not work properly
 * If certain pages persist in showing old-style links, they may be cached; add "?action=purge" to the URL to clear the cache for a given page. If you are working on an active site using the old-style links, some pages may mysteriously revert to old-style links as visitors browse them through the old-style portal, causing the old-style to be re-cached.
 * This method is probably not very compatible with most webstats generators, but I haven't tried it long enough to see what happens. Probably all pages will be logged as 404 errors, which isn't terribly useful.

Note about CPU load: Obviously it has to do the same URL translation it would normally have to do and then determine that the file doesn't exist, but that shouldn't take any more cycles than locating an existing file; for URLs containing at least one slash, it should be quicker. Given all the processing done by the MediaWiki software for loading "normal" wiki pages, I suspect the difference is negligible.

Using a 403 [forbidden] Handler
reports: Something I have found to work very well is to force all files except index.php to "Deny from All" and use a 403 error handler to push all requests to index.php:

An example .htaccess: Deny from All  Allow from All  DirectoryIndex index.php ErrorDocument 403 /index.php
 * 1) NOTE: DirectoryIndex may not be necessary

This could probably be shortened up, but I have found it works great. :-)

This would seem to be a variation on the 404 method above; not sure if there are any advantages or disadvantages to either one.