# String Similarity: Hackerrank

Problem Statement:
For two strings A and B, we define the similarity of the strings to be the length of the longest prefix common to both strings. For example, the similarity of strings “abc” and “abd” is 2, while the similarity of strings “aaa” and “aaab” is 3.

Calculate the sum of similarities of a string S with each of it’s suffixes.

Input:
The first line contains the number of test cases T. Each of the next T lines contains a string each.

Output:
Output T lines containing the answer for the corresponding test case.

Constraints:
1. $1\leq T\leq 10$
2. The length of each string is at most 100000 and contains only lower case characters.

Solution Idea:
The obvious solution is straight forward. You write a function that calculates the similarity (length of longest prefix) between 2 strings. Now you call that function with the string and each of its suffix (by using the substring method).

Edit:
The solution mentioned above is not good enough for an acceptance in the HackerRank Website as pointed out by Shiv. The solution is commented out in the code. Now, how can we improve the code. First of all, calculating the substring is an expensive operation and we can achieve what we want by keeping track of the start index. Secondly, String.charAt() seems to be slower than accessing an array with an index. So, I first convert the string to an array and then use index. With this 2 improvement, I was able to get it accepted in the HackerRank Website, Yay!

Important:
Although, I got the code accepted in the website, I still believe the original intention of the author of the problem is to make us find a linear time algorithm. There is one using Suffix array. A suffix array is an array of integers giving the starting positions of all suffixes of a string in lexicographical order. A suffix array can be constructed in linear time (Look at this research: http://www.cs.sysu.edu.cn/nong/index.files/Two%20Efficient%20Algorithms%20for%20Linear%20Suffix%20Array%20Construction.pdf ). When the suffix array is constructed then this problem can be solved in linear time (Look at this stackoverflow: http://stackoverflow.com/questions/11069467/longest-common-prefixes )

Solution Gist:

Tagged: , , , ,

## 6 thoughts on “String Similarity: Hackerrank”

1. Shiv September 7, 2013 at 7:59 am Reply

Wont this exceed the given time limit?

• Faisal R. September 7, 2013 at 7:30 pm Reply

Here you go! This implementation is accepted by the HackerRank website! Thanks for pointing this out!

2. Shigglewitz November 21, 2013 at 8:34 am Reply

I used a very similar algorithm on my first submission, and the last two test cases failed due to time limit. Did your algorithm pass all test cases?

• Faisal R. November 21, 2013 at 8:47 am Reply

It was accepted during the time I submitted around 4 months ago, but currently it is failing last 2 test cases due to time limit. I guess HackerRank made the checking algorithm more strict. I will fix it asap. Keep tuned.

• JIE CHEN December 12, 2013 at 6:13 pm

Firstly, I wanna thank you for your post. That helps me a lot.
It failed when you encounter a string like “aaaaa..aa”. In that case, the running time will be n*n/2, which is not good enough. My guess is that if you can update your code with the linear algorithm, it will be fine.
I am working with it now too.

• Faisal R. December 13, 2013 at 2:14 pm

I will try updating this algorithm, once I have some time. I am thinking about a variation of the KMP algorithm. Thank you so much, for the comment!