|
|
| (59 intermediate revisions by 4 users not shown) |
| Line 1: |
Line 1: |
| ==Navigation==
| | {{to wooz|MWX/SpamFerret}} |
| [[computing]]: [[software]]: [[MediaWiki]]: [[fighting spam posts in MediaWiki|fighting spam]]
| |
| ==Overview==
| |
| [[SpamFerret]] is [[User:Woozle|my]] attempt at an improvement over the SpamBlacklist extension. It is currently in development.
| |
| | |
| ==Purpose==
| |
| The SpamBlacklist extension has a number of shortcomings:
| |
| * Can only handle a limited number of entries before exceeding the maximum string-length it can process, at which point all spam is allowed through
| |
| * Does not keep track of which entries are still being "tried" (to allow for periodic "cleaning" of the list)
| |
| * Does not keep track of offending IP addresses
| |
| * Handles only domains; cannot blacklist by URL path (for partially compromised servers) or "catch phrases" found in spam and nowhere else
| |
| * Does not keep a log of failed spam attempts, so there is no way to measure effectiveness
| |
| | |
| [[SpamFerret]] will:
| |
| * be database-driven
| |
| * keep logs and counts of spam attempts by blacklisting and by IP
| |
| * match domains ("http://*.domain"), URLs ("http://*.domain/path") and catch-phrases ("helo please to forgive my posting but my children are hungary")
| |
| ** should also be able to match patterns, like long lists of links in a certain format
| |
| | |
| It may also be unsuitable for use on busier wikis, as the checking process (which only happens when an edit is submitted) may take a fair amount of CPU time (checks the entire page once per blacklisted pattern). This shouldn't be a problem for smaller wikis, which are often monitored less frequently than busier wikis and hence are more vulnerable to spam.
| |
| ==Design==
| |
| <sql>
| |
| CREATE TABLE `patterns` (
| |
| `ID` INT NOT NULL AUTO_INCREMENT,
| |
| `Pattern` varchar(255) COMMENT 'pattern to match (regex)',
| |
| `WhenAdded` DATETIME DEFAULT NULL COMMENT 'when this entry was added',
| |
| `WhenTried` DATETIME DEFAULT NULL COMMENT 'when a spammer last attempted to include this pattern',
| |
| `isActive` BOOL COMMENT 'if FALSE, do not include in checking',
| |
| `isURL` BOOL COMMENT 'TRUE indicates that additional URL-related stats may be collected',
| |
| `isRegex` BOOL COMMENT 'TRUE indicates that the string should not be escaped before feeding to preg_match()',
| |
| PRIMARY KEY(`ID`)
| |
| )
| |
| ENGINE = MYISAM;
| |
| | |
| CREATE TABLE `clients` (
| |
| `ID` INT NOT NULL AUTO_INCREMENT,
| |
| `Address` varchar(15) COMMENT 'IP address',
| |
| `WhenFirst` DATETIME COMMENT 'when this IP address first submitted a spam',
| |
| `WhenLast` DATETIME COMMENT 'when this IP address last submitted a spam',
| |
| `Count` INT COMMENT 'number of attempts',
| |
| PRIMARY KEY(`ID`)
| |
| )
| |
| ENGINE = MYISAM;
| |
| | |
| CREATE TABLE `attempts` (
| |
| `ID` INT NOT NULL AUTO_INCREMENT,
| |
| `When` DATETIME COMMENT 'timestamp of attempt',
| |
| `ID_Pattern` INT NOT NULL COMMENT '(patterns.ID) matching pattern found',
| |
| `ID_Client` INT NOT NULL COMMENT '(clients.ID) spamming client',
| |
| PRIMARY KEY(`ID`)
| |
| )
| |
| ENGINE = MYISAM;
| |
| </sql>
| |