MediaWiki/archive/customizing/URLs

from HTYP, the free directory anyone can edit if they can prove to me that they're not a spambot
< MediaWiki‎ | archive‎ | customizing
Revision as of 11:08, 10 June 2007 by Woozle (talk | contribs) (Reverted edits by 83.224.64.4 (Talk); changed back to last version by 89.100.196.30)
Jump to navigation Jump to search

navbar

computing: software: web: MediaWiki: customization: shortening MediaWiki URLs

Overview

There are (at least) two "standard" ways of prettifying MediaWiki URLs, documented here.

There's also another way of doing it if you have access to httpd.conf or .htaccess. It's fairly tidy and quite flexible, though I don't know how much additional load it puts on the server (see brief discussion at the end).

Using mod_rewrite

This is probably documented elsewhere, but this is what actually worked for me on a shared server where I couldn't modify httpd.conf.

This assumes MediaWiki is installed in the root of the www pages.

First, the .htaccess file needs to include:

<IfModule mod_rewrite.c>
RewriteEngine on
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ index.php/$1 [L,QSA]
</IfModule>

Second, modify LocalSettings.php:

$wgArticlePath = "/$1";

Then do ?action=purge on any page to test it. Internal links on existing pages will convert to the new URLs any time they are saved or purged.

The above seems to work for everything (including editing) without breaking old-style URLs, and may offer clues as to how to fix the minor problems which make the 404 method (below) unusable.

Using a 404 Handler

At this point, I can't recommend this method unless nothing else works. If you try this method, please report back on how it goes. --Woozle 18:46, 29 June 2006 (EDT)

It uses the 404 (missing page) redirect mechanism — a standard /index.php/ request is handled by the standard code (in index.php), but any other URL which doesn't correspond to an existing page (within the wiki or not) is handled by a modified index.php. For any given "nonexistent" URL of the form "http://yourdomain.com/nonexistent/page", the code returns a wiki page entitled "Nonexistent/page", with the "nonexistent" URL displayed as the URL for that page.

There is also a feature wherein you can create a page called Mediawiki:your/url/here and it will redirect to an article whose title is the contents of that page. For example: http://wiki.vbz.net/Currentevents is redirected to vbzwiki:Current events because the page vbzwiki:MediaWiki:Currentevents contains the text "Current events".

First, in the main .htaccess file (or in httpd.conf if you prefer), assign a location to handle 404 errors such that a PHP file will be loaded -- either of these will do, for example:

ErrorDocument 404 /errors/404/
ErrorDocument 404 /wiki404error.php

In the first instance, your modified index.php file would go in /errors/404/; in the second, it would be renamed wiki4040error.php and go in the same folder as the normal index.php.

The remaining instructions depend on which MediaWiki version you are using.

Version 1.4

These instructions were made from changes that actually worked, but I may have left out some steps. I was more careful when I did the changes for version 1.5, so if these don't work check the version 1.5 instructions for anything missing.

Second: Make the changes indicated in the 404-handling copy of index.php:

if ( '' == $title && 'delete' != $action ) {
## 2005-06-19 Woozle mods for "missing" page
	# title not passed in parameter; use REQUEST_URI from environment
	$title = rawurldecode(ltrim($_SERVER['REQUEST_URI'], " /"));
	# see if there's a page designated for this URI
	$wgTitle = Title::newFromText( wfMsgForContent( $title ) );
	if ('' == $wgTitle) {
		$wgTitle = Title::newFromText( $title );
	}
## end Woozle mods
} elseif ( $curid = $wgRequest->getInt( 'curid' ) ) {
	/* redirect to canonical url, make it a 301 to allow caching */
	$wgOut->setSquidMaxage( 1200 );
# 2005-06-21 Woozle mods to allow 404 page to summon wiki page without redirecting
#	$wgOut->redirect( $wgTitle->getFullURL(), '301');
	$wgArticle = new Article( $wgTitle );
#  	$mainText = $wgOut->parse( $wgArticle->getContent( false ) );
#	echo $mainText;
	$wgArticle->view();
# end Woozle mods
} else if ( Namespace::getSpecial() == $wgTitle->getNamespace() ) {

Version 1.5

Second: In the same folder as the modified index.php, create a LocalSettings.php with the following contents:

<?php
require_once( "../../LocalSettings.php" );

$wgScript           = $wgScriptPath;
$wgArticlePath      = "$wgScript/$1";
?>

Note: "../../LocalSettings.php" works if your modified index.php is buried two folders deep from your main index.php (as in the /errors/404/index.php example); adjust it as needed to point to your main LocalSettings.php.

Third: Make the changes indicated in the 404-handling copy of index.php:

  • First change - need to point to the copied Defines.php:
# 2005-10-25 Woozle - for 404 handling
require_once( './Defines.php' );
  • Second change - this is optional, but it cleans up the file a lot:
# 2005-10-25 Woozle - config code removed because it will never be executed
#if( !file_exists( 'LocalSettings.php' ) ) {
# ...
#}
  • Third change - this pulls in the title-request from the error URI:
# Query string fields
# 2005-10-25 Woozle - 404 support - parameters have to be parsed from $_SERVER instead of $_REQUEST
	$raw_uri = rawurldecode(ltrim($_SERVER['REQUEST_URI'], " /"));
	$arr_uri = explode('?',$raw_uri);
	$title = $arr_uri[0];
	$uri_qry= $arr_uri[1];
	parse_str($uri_qry,$_REQUEST);
	$action = $wgRequest->getVal( 'action', 'view' );
	$title_force = $wgRequest->getVal( 'title' );
	if ( != $title_force) {
		$title = $title_force;
	}
# 2005-10-25 END
  • Fourth change - optional and untested - allow title redirection
if (  == $title && 'delete' != $action ) {
	$wgTitle = Title::newFromText( wfMsgForContent( 'mainpage' ) );
# 2005-10-26 Woozle - 404 support - optional redirect based on "mediawiki:articlename"
	if ( == $wgTitle) {
		$wgTitle = Title::newFromText( $title );
	}
# 2005-10-26 END
  • Fifth change - I'm actually not sure if this is necessary, but don't have time to test uncommenting it:
# 2005-10-25 Woozle - for 404 handling - block out redirection code
# was -- if ((action is explicitly "view") AND (title is not passed as param) OR (title is not in canonical form) AND ??
#} else if ( ( $action == 'view' ) && 	(!isset( $_GET['title'] ) || $wgTitle->getPrefixedDBKey() != $_GET['title'] ) && !count( array_diff( array_keys( $_GET ), array( 'action', 'title' ) ) ) )
#{
#	/* redirect to canonical url, make it a 301 to allow caching */
#	$wgOut->setSquidMaxage( 1200 );
#	$wgOut->redirect( $wgTitle->getFullURL(), '301');
} else if ( NS_SPECIAL == $wgTitle->getNamespace() ) {

Finally

Finally, put the modified index.php where it will be the page used to handle 404 errors.

  • Caveats:
    • Your arbitrary URL will have its first character capitalized before it is displayed as the page's title or used to load another page (if you have set up a Mediawiki: page for it), although the URL shown will remain unchanged
    • There is probably a lot of excess index.php code which can be stripped out, as it will never be executed in this context
    • URLs ending in slashes appear to be a problem for some namespaces; the wiki code appears to be reading the URL from some place other than the modified code. (This doesn't seem to be a problem for version 1.5.)
    • All wiki links on the loaded page will point back to canonical wiki URLs, e.g. http://htyp.org/wiki/index.php/Main_Page; to change this. see "Shortening the links" below. 2006-02-16 This has been fixed.
    • Image thumbnail links don't work. 2006-02-16 This has been fixed.
  • 2006-02-16 Notes:
    • I changed the procedure a bit and page viewing now seems to work consistently, but I've only checked a few pages for proper behavior. Please test thoroughly before using on a production page, and let me know what you find.
    • Editing does not work properly
    • If certain pages persist in showing old-style links, they may be cached; add "?action=purge" to the URL to clear the cache for a given page. If you are working on an active site using the old-style links, some pages may mysteriously revert to old-style links as visitors browse them through the old-style portal, causing the old-style to be re-cached.
    • This method is probably not very compatible with most webstats generators, but I haven't tried it long enough to see what happens. Probably all pages will be logged as 404 errors, which isn't terribly useful.

Note about CPU load: Obviously it has to do the same URL translation it would normally have to do and then determine that the file doesn't exist, but that shouldn't take any more cycles than locating an existing file; for URLs containing at least one slash, it should be quicker. Given all the processing done by the MediaWiki software for loading "normal" wiki pages, I suspect the difference is negligible.

Using a 403 [forbidden] Handler

Something I have found to work very well is to force all files except index.php to "Deny from All" and use a 403 error handler to push all requests to index.php:

An example .htaccess:

Deny from All
<Files "index.php">
   Allow from All
</Files>
# NOTE: DirectoryIndex may not be necessary
DirectoryIndex index.php
ErrorDocument 403 /index.php

This could probably be shortened up, but I have found it works great. :-) anonymous user 64.110.252.116

Comments

Please feel free to post comments here or on the Talk page if you try any of these procedures.

Hi, Try this method also, it works for me: http://wiki.welldesignedurls.org/Clean_Urls_for_MediaWiki