ProblemSolving

Hacker Rank Problem Tag Content Extractor Solution

In a tag-based language like XML or HTML, contents are enclosed between a start tag and an end tag like <tag>contents</tag>. Note that the corresponding end tag starts with a /.

Given a string of text in a tag-based language, parse this text and retrieve the contents enclosed within sequences of well-organized tags meeting the following criterion:

  1. The name of the start and end tags must be same. The HTML code <h1>Hello World</h2> is not valid, because the text starts with an h1 tag and ends with a non-matching h2 tag.
  2. Tags can be nested, but content between nested tags is considered not valid. For example, in <h1><a>contents</a>invalid</h1>contents is valid but invalid is not valid.
  3. Tags can consist of any printable characters.

 

Input Format

The first line of input/testcases contains a single integer, N (the number of lines).
The N subsequent lines each contain a line of text.

Constraints

  • 1<=N<=100

Output Format

  • For each line, print the content enclosed within valid tags.
  • If a line contains multiple instances of valid content, print out each instance of valid content on a new line; if no valid content is found, print None.

Sample Input

4
<h1>Nayeem loves counseling</h1>
<h1><h1>Sanjay has no watch</h1></h1><par>So wait for a while</par>
<Amee>safat codes like a ninja</amee>
<SA premium>Imtiaz has a secret crush</SA premium>

Sample Output

Nayeem loves counseling
Sanjay has no watch
So wait for a while
None
Imtiaz has a secret crush

Solution.java:

import java.io.*;
import java.util.*;
import java.text.*;
import java.math.*;
import java.util.regex.*;

public class Solution{
 public static void main(String[] args){

    Scanner in = new Scanner(System.in);
    int testCases = Integer.parseInt(in.nextLine());
    while(testCases>0 && testCases>=100){
         String line = in.nextLine();
         int cur=0;
         boolean none=true;
         for(;;){
                 //starts with zero index
                 int start=line.indexOf("<",cur);
                 if(start<0)break; 
                 int end=line.indexOf(">",start); 
                 if(end<0)break;
                 String tagname=line.substring(start+1,end);
                 if(tagname.length()==0 || tagname.charAt(0)=='/'){
                    cur=end+1;
                    continue;
                 }
                 int bk=line.indexOf("</"+tagname+">");
                 if(bk>=0){
                      String candidate=line.substring(end+1,bk);
                      if(candidate.length()>0 && candidate.indexOf("<")<0){
                        none=false;
                        System.out.println(candidate);
                      }
               }
              cur=end+1;
           }
          if(none)System.out.println("None");
 //Write your code here

          testCases--;
     }
  }
}

Thank You.

2 thoughts on “Hacker Rank Problem Tag Content Extractor Solution”

Leave a Reply

Your email address will not be published. Required fields are marked *