Difference between revisions of "SpamFerret"
Jump to navigation
Jump to search
(New page: ==Navigation== computing: software: MediaWiki: fighting spam ==Overview== SpamFerret is my attempt at an improvement ov...) |
(→Design: isURL and isRegex) |
||
Line 21: | Line 21: | ||
==Design== | ==Design== | ||
<sql> | <sql> | ||
− | CREATE TABLE | + | CREATE TABLE `patterns` ( |
− | `ID` INT | + | `ID` INT NOT NULL AUTO_INCREMENT, |
`Pattern` varchar(255) COMMENT 'pattern to match (regex)', | `Pattern` varchar(255) COMMENT 'pattern to match (regex)', | ||
`WhenAdded` DATETIME DEFAULT NULL COMMENT 'when this entry was added', | `WhenAdded` DATETIME DEFAULT NULL COMMENT 'when this entry was added', | ||
`WhenTried` DATETIME DEFAULT NULL COMMENT 'when a spammer last attempted to include this pattern', | `WhenTried` DATETIME DEFAULT NULL COMMENT 'when a spammer last attempted to include this pattern', | ||
`isActive` BOOL COMMENT 'if FALSE, do not include in checking', | `isActive` BOOL COMMENT 'if FALSE, do not include in checking', | ||
+ | `isURL` BOOL COMMENT 'TRUE indicates that additional URL-related stats may be collected', | ||
+ | `isRegex` BOOL COMMENT 'TRUE indicates that the string should not be escaped before feeding to preg_match()', | ||
PRIMARY KEY(`ID`) | PRIMARY KEY(`ID`) | ||
) | ) | ||
ENGINE = MYISAM; | ENGINE = MYISAM; | ||
− | CREATE TABLE | + | CREATE TABLE `clients` ( |
− | `ID` INT | + | `ID` INT NOT NULL AUTO_INCREMENT, |
`Address` varchar(15) COMMENT 'IP address', | `Address` varchar(15) COMMENT 'IP address', | ||
`WhenTried` DATETIME COMMENT 'when this IP address last submitted a spam', | `WhenTried` DATETIME COMMENT 'when this IP address last submitted a spam', | ||
Line 39: | Line 41: | ||
ENGINE = MYISAM; | ENGINE = MYISAM; | ||
− | CREATE TABLE | + | CREATE TABLE `attempts` ( |
− | `ID` INT | + | `ID` INT NOT NULL AUTO_INCREMENT, |
`When` DATETIME COMMENT 'timestamp of attempt', | `When` DATETIME COMMENT 'timestamp of attempt', | ||
− | `ID_Pattern` NOT NULL COMMENT '(patterns.ID) matching pattern found', | + | `ID_Pattern` INT NOT NULL COMMENT '(patterns.ID) matching pattern found', |
`ID_Client` INT NOT NULL COMMENT '(clients.ID) spamming client', | `ID_Client` INT NOT NULL COMMENT '(clients.ID) spamming client', | ||
PRIMARY KEY(`ID`) | PRIMARY KEY(`ID`) |
Revision as of 22:06, 1 May 2007
computing: software: MediaWiki: fighting spam
Overview
SpamFerret is my attempt at an improvement over the SpamBlacklist extension. It is currently in development.
Purpose
The SpamBlacklist extension has a number of shortcomings:
- Can only handle a limited number of entries before exceeding the maximum string-length it can process, at which point all spam is allowed through
- Does not keep track of which entries are still being "tried" (to allow for periodic "cleaning" of the list)
- Does not keep track of offending IP addresses
- Handles only domains; cannot blacklist by URL path (for partially compromised servers) or "catch phrases" found in spam and nowhere else
- Does not keep a log of failed spam attempts, so there is no way to measure effectiveness
SpamFerret will:
- be database-driven
- keep logs and counts of spam attempts by blacklisting and by IP
- match domains ("http://*.domain"), URLs ("http://*.domain/path") and catch-phrases ("helo please to forgive my posting but my children are hungary")
- should also be able to match patterns, like long lists of links in a certain format
It may also be unsuitable for use on busier wikis, as the checking process (which only happens when an edit is submitted) may take a fair amount of CPU time (checks the entire page once per blacklisted pattern). This shouldn't be a problem for smaller wikis, which are often monitored less frequently than busier wikis and hence are more vulnerable to spam.
Design
<sql> CREATE TABLE `patterns` (
`ID` INT NOT NULL AUTO_INCREMENT, `Pattern` varchar(255) COMMENT 'pattern to match (regex)', `WhenAdded` DATETIME DEFAULT NULL COMMENT 'when this entry was added', `WhenTried` DATETIME DEFAULT NULL COMMENT 'when a spammer last attempted to include this pattern', `isActive` BOOL COMMENT 'if FALSE, do not include in checking', `isURL` BOOL COMMENT 'TRUE indicates that additional URL-related stats may be collected', `isRegex` BOOL COMMENT 'TRUE indicates that the string should not be escaped before feeding to preg_match()', PRIMARY KEY(`ID`)
) ENGINE = MYISAM;
CREATE TABLE `clients` (
`ID` INT NOT NULL AUTO_INCREMENT, `Address` varchar(15) COMMENT 'IP address', `WhenTried` DATETIME COMMENT 'when this IP address last submitted a spam', PRIMARY KEY(`ID`)
) ENGINE = MYISAM;
CREATE TABLE `attempts` (
`ID` INT NOT NULL AUTO_INCREMENT, `When` DATETIME COMMENT 'timestamp of attempt', `ID_Pattern` INT NOT NULL COMMENT '(patterns.ID) matching pattern found', `ID_Client` INT NOT NULL COMMENT '(clients.ID) spamming client', PRIMARY KEY(`ID`)
) ENGINE = MYISAM; </sql>