In Computer Science, SHOBHIT-Improved
String Search Algorithm is a string searching algorithm created by SHOBHIT
UPADHYAYA in August, 2013. He is working as a Software Developer in Bangalore
India.
“This algorithm uses
the first, last, mid1 and mid2 index of the substring for a pattern search.”
The main reason for publishing this new algorithm is, there was bug in my last search algorithm SHOBHIT-Advance String Search Algorithm
. That bug had been raised by my blog follower.
I would like to give thanks to
that guy.
My new algorithm is faster
than the previous one.
PESUDOCODE OF ALGORITHM:
Shobhit_Improved_String_Search(text, text_len, substr,
substr_len)
{
If substr_len
is greater than text_len
return -1
sub_end = substr_len -1
if substr_len is even
then
sub_mid1 = (substr_len /2 ) – 1
sub_mid2 = (substr_len/2)
else
sub_mid1 = (substr_len/2)
sub_mid2
= (substr_len/2)
sub_start = 0
t_end = sub_end
t_mid1 = sub_mid1
t_mid2 = sub_mid2
m1 = sub_mid1
m2 = sub_mid2
index=-1
for ( t_start = 0; t_end < text_len; t_start ++)
{
If text[t_start] is equal to
substr[sub_start] AND
text[t_end] is equal
to substr[sub_end]
then
if index is equal to
-1
then
index
= t_start
if ( sub_start + 1 is equal to sub_mid1 AND
sub_mid2
+1 is equal to sub_end )
OR
( sub_start is equal to sub_mid1 AND
sub_mid2 is equal to sub_end )
then
return
index; //Pattern Found
if
text[t_mid1] is equal to substr[sub_mid1] AND
text[t_mid2]
is equal to substr[sub_mid2]
then
increment sub_start by 1
decrement sub_end by 1
decrement sub_mid1 by 1
increment sub_mid2 by 1
decrement t_end by 1
decrement t_mid1 by 1
increment t_mid2 by 1
else
set sub_start to zero
set sub_end to substr_len -1
set sub_mid1 to m1
set sub_mid2 to m2
set t_end to t_start + sub_end + 1
set t_mid1 to sub_mid1 + t_start + 1
set t_mid2 to sub_mid2 + t_start + 1
set index to -1
else
set
sub_start to zero
set sub_end to
substr_len -1
set sub_mid1 to m1
set sub_mid2 to m2
set t_end to t_start
+ sub_end
set t_mid1 to sub_mid1
+ t_start
set t_mid2 to
sub_mid2 + t_start
set index to -1
If text[t_start] is
equal to substr[sub_start] AND
text[t_end] is equal to
substr[sub_end]
then
if index is equal to -1
then
index = t_start
if ( sub_start + 1 is equal to sub_mid1 AND
sub_mid2 +1 is equal to sub_end )
OR
(
sub_start is equal to sub_mid1 AND
sub_mid2
is equal to sub_end )
then
return index; //Pattern Found
if text[t_mid1] is equal to
substr[sub_mid1] AND
text[t_mid2] is equal to
substr[sub_mid2]
then
increment
sub_start by 1
decrement
sub_end by 1
decrement
sub_mid1 by 1
increment
sub_mid2 by 1
decrement
t_end by 1
decrement
t_mid1 by 1
increment
t_mid2 by 1
else
increment
t_end by 1
increment
t_mid1 by 1
increment
t_mid2 by 1
}
TIME and SPACE Complexity:
For a text of length n and substring of length m.
Its best case, time
complexity is O(m/4) and
in worst case, time complexity is O(n
– (m/4) ).
In all the cases
best, average and worst its space
complexity is O(1).
This algorithm checks
four characters at a time.
In best case where
the substring is at the start of the text, we can check all the character of
substring in only m/4 time.
In worst case where
the substring is at the last of the text, we need to iterate till n – m
SOURCE CODE AND LICENSE:-
The
code of this algorithm is free and it’s under
GNU GENERAL PUBLIC LICENSE
Version 3, 29 June 2007
Please
find out the source code from the following link.
The
source code is having the algorithm api and a test application code to test the
api.
If
you found any bug or if you want to communicate related to this algorithm drop
me a mail at < sho.upadhyaya@gmail.com > or
do comment on my blog.