Tag Archives: hashmap

SpamDetector : TopCoder Problem 2229

[The problem appeared in TopCoder SRM 205 (Div-1, Level-1) and TopCoder SRM 205 (Div-2, Level-2)]

Problem link: http://community.topcoder.com/stat?c=problem_statement&pm=2229

Problem Statement:
You are writing part of a spam detection system. Your job is to analyze the subject lines of e-mail messages and return a count of known spam signalling keywords in the subject lines. Your task is made more difficult by the spammers who try to hide the keywords in several ways. Here we will consider just one obfuscation technique: duplicating characters. Duplicating characters means taking an existing character in a word and inserting more copies of that character into the same place in the word. This process can then be repeated on a different character in the word. The spam signalling keyword “credit” might be modified to “creddiT”, “CredittT” or “ccrreeeddiitt”, etc., but not “credict”.

For the purposes of this problem we will consider subject lines which contain only letters and spaces. The “words” in the subject line are delimited by spaces. A word in the subject line is considered a “match” if the entire word is the same as at least one entire keyword, after possibly removing some duplicated characters from the subject word. A keyword that matches only part of a subject word or a subject word that matches only part of a keyword does not count. Note that if a keyword contains a double letter, the subject word must also contain (at least) a double letter in the same position to match (“double letter” means two consecutive letters in the word that are the same). For this application, all matches (and the use of the term “same”) are case insensitive.

Given a subject line and a list of keywords, return the count of words in the subject line which “match” words in the keyword list. If multiple words in the subject line match the same keyword, they are each counted, but a word in the subject line that matches multiple keywords is only counted once.
Continue reading