ProblemSolving

HackerRank Problem Java Regex 2 – Duplicate Words Solution

In this challenge, we use regular expressions (RegEx) to remove instances of words that are repeated more than once, but retain the first occurrence of any case-insensitive repeated word. For example, the words love and toare repeated in the sentence I love Love to To tO code. Can you complete the code in the editor so it will turn I love Love to To tO code into I love to code?

To solve this challenge, complete the following three lines:

  1. Write a RegEx that will match any repeated word.
  2. Complete the second compile argument so that the compiled RegEx is case-insensitive.
  3. Write the two necessary arguments for replaceAll such that each repeated word is replaced with the very first instance the word found in the sentence. It must be the exact first occurrence of the word, as the expected output is case-sensitive.

Input Format

The following input is handled for you the given stub code:

The first line contains an integer, , denoting the number of sentences.
Each of the  subsequent lines contains a single sentence consisting of English alphabetic letters and whitespace characters.

Constraints

  • Each sentence consists of at most  English alphabetic letters and whitespaces.

Sample Input

5
Goodbye bye bye world world world
Sam went went to to to his business
Reya is is the the best player in eye eye game
in inthe
Hello hello Ab aB

Sample Output

Goodbye bye world
Sam went to his business
Reya is the best player in eye game
in inthe
Hello Ab

Explanation

  1. We remove the second occurrence of bye and the second and third occurrences of world from Goodbye bye bye world world world to get Goodbye bye world.
  2. We remove the second occurrence of went and the second and third occurrences of to from Sam went went to to to his business to get Sam went to his business.
  3. We remove the second occurrence of is, the second occurrence of the, and the second occurrence of eyefrom Reya is is the the best player in eye eye game to get Reya is the best player in eye game.
  4. The sentence in inthe has no repeated words, so we do not modify it.
  5. We remove the second occurrence of ab from Hello hello Ab aB to get Hello Ab. It’s important to note that our matching is case-insensitive, and we specifically retained the first occurrence of the matched word in our final string.

DuplicateWords.java :

import java.util.Scanner;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class DuplicateWords {

public static void main(String[] args) {

String regex = "\\b([a-z]+)\\b(?:\\s+\\1\\b)+";
 Pattern p = Pattern.compile(regex, Pattern.CASE_INSENSITIVE);

Scanner in = new Scanner(System.in);
 int numSentences = Integer.parseInt(in.nextLine());
 
 while (numSentences-- > 0) {
 String input = in.nextLine();
 
 Matcher m = p.matcher(input);
 
 // Check for subsequences of input that match the compiled pattern
 while (m.find()) {
 input = input.replaceAll(m.group(), m.group(1));
 }
 
 // Prints the modified sentence.
 System.out.println(input);
 }

 in.close();
 }
}

Explanation:
1st Capturing Group : “([az]+)”

Match a single character present in the list below

[az]+
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
a-z a single character in the range between a (index 97) and z (index 122) (case insensitive)
\b assert position at a word boundary (^\w|\w$|\W\w|\w\W)

Non-capturing group

(?:\s+\1\b)+
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
\s+

matches any whitespace character (equal to [\r\n\t\f\v ])

+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
\1 matches the same text as most recently matched by the 1st capturing group
\b assert position at a word boundary (^\w|\w$|\W\w|\w\W)
Flags-
i modifier: insensitive. Case insensitive match (ignores case of [a-zA-Z])

Thank You. Soon Will Back With New Problem. Till Then To Know More About RegEx .Click Here

2 thoughts on “HackerRank Problem Java Regex 2 – Duplicate Words Solution”

  1. Wonderful work! That is the kind of information that are supposed to be shared around the internet. Shame on Google for not positioning this put up upper! Come on over and visit my site . Thanks =)

  2. iF0XVc Pretty nice post. I just stumbled upon your blog and wished to say that I have really enjoyed browsing your blog posts. After all I will be subscribing to your feed and I hope you write again soon!

Leave a Reply to furtdso linopv Cancel reply

Your email address will not be published. Required fields are marked *