Difference between revisions of "SpamFerret"

from HTYP, the free directory anyone can edit if they can prove to me that they're not a spambot
Jump to navigation Jump to search
(→‎Design: isURL and isRegex)
(Replaced content with "{{to wooz|MWX/SpamFerret}}")
Tag: Replaced
 
(60 intermediate revisions by 4 users not shown)
Line 1: Line 1:
==Navigation==
+
{{to wooz|MWX/SpamFerret}}
[[computing]]: [[software]]: [[MediaWiki]]: [[fighting spam posts in MediaWiki|fighting spam]]
 
==Overview==
 
[[SpamFerret]] is [[User:Woozle|my]] attempt at an improvement over the SpamBlacklist extension. It is currently in development.
 
 
 
==Purpose==
 
The SpamBlacklist extension has a number of shortcomings:
 
* Can only handle a limited number of entries before exceeding the maximum string-length it can process, at which point all spam is allowed through
 
* Does not keep track of which entries are still being "tried" (to allow for periodic "cleaning" of the list)
 
* Does not keep track of offending IP addresses
 
* Handles only domains; cannot blacklist by URL path (for partially compromised servers) or "catch phrases" found in spam and nowhere else
 
* Does not keep a log of failed spam attempts, so there is no way to measure effectiveness
 
 
 
[[SpamFerret]] will:
 
* be database-driven
 
* keep logs and counts of spam attempts by blacklisting and by IP
 
* match domains ("http://*.domain"), URLs ("http://*.domain/path") and catch-phrases ("helo please to forgive my posting but my children are hungary")
 
** should also be able to match patterns, like long lists of links in a certain format
 
 
 
It may also be unsuitable for use on busier wikis, as the checking process (which only happens when an edit is submitted) may take a fair amount of CPU time (checks the entire page once per blacklisted pattern). This shouldn't be a problem for smaller wikis, which are often monitored less frequently than busier wikis and hence are more vulnerable to spam.
 
==Design==
 
<sql>
 
CREATE TABLE `patterns` (
 
  `ID` INT NOT NULL AUTO_INCREMENT,
 
  `Pattern` varchar(255) COMMENT 'pattern to match (regex)',
 
  `WhenAdded` DATETIME DEFAULT NULL COMMENT 'when this entry was added',
 
  `WhenTried` DATETIME DEFAULT NULL COMMENT 'when a spammer last attempted to include this pattern',
 
  `isActive` BOOL COMMENT 'if FALSE, do not include in checking',
 
  `isURL` BOOL COMMENT 'TRUE indicates that additional URL-related stats may be collected',
 
  `isRegex` BOOL COMMENT 'TRUE indicates that the string should not be escaped before feeding to preg_match()',
 
  PRIMARY KEY(`ID`)
 
)
 
ENGINE = MYISAM;
 
 
 
CREATE TABLE `clients` (
 
  `ID` INT NOT NULL AUTO_INCREMENT,
 
  `Address` varchar(15) COMMENT 'IP address',
 
  `WhenTried` DATETIME COMMENT 'when this IP address last submitted a spam',
 
  PRIMARY KEY(`ID`)
 
)
 
ENGINE = MYISAM;
 
 
 
CREATE TABLE `attempts` (
 
  `ID` INT NOT NULL AUTO_INCREMENT,
 
  `When` DATETIME COMMENT 'timestamp of attempt',
 
  `ID_Pattern` INT NOT NULL COMMENT '(patterns.ID) matching pattern found',
 
  `ID_Client` INT NOT NULL COMMENT '(clients.ID) spamming client',
 
  PRIMARY KEY(`ID`)
 
)
 
ENGINE = MYISAM;
 
</sql>
 

Latest revision as of 20:22, 1 May 2022

Wooz-dev-logo.take 1.crop.1200pxw.png This page has been moved to wooz.dev, User:Woozle's coding site.