Powered by Blogger.

Program to find max repeated or duplicate words from a text file

>> Thursday, July 9, 2015

In this tutorial, you will learn How to Find Maximum Occurrence of Words or repeated words from given Text File.
In the earlier post we have gone through How to read a file using BufferedReader and Scanner.This would give you a basic idea to read a file through the Streams. And the related post to this current program best way to find repeated characters from a String also helpful to understand the maximum repeated words from the given text file. Here i am using BufferedReader and FileInputStream to read a file, if the file does not exists this throws an exception FileNotFoundException.
Program to find max repeated words from a text file_javabynataraj
Here are the steps to write the program:
  1. Write a method getWordCount to count the number of words in a given text file.
  2. Create a Map with key value as String, Integer to Store words and its count.
  3. Create sorByValue method with the return type List(contains map objects) to  sort by the map values.
  4. Write main method to read file and call the above methods.
import java.io.BufferedReader;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.ArrayList;
import java.util.Collections;
import java.util.Comparator;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Map.Entry;
import java.util.Set;
import java.util.StringTokenizer;
/**
 * @author javabynataraj.blogspot.com
 */
public class CountDuplicateWords {
     
    public Map<String, Integer> getWordCount(String fileName){
        BufferedReader br = null;
        Map<String, Integer> wordMap = new HashMap<String, Integer>();
        try {
         br = new BufferedReader(new InputStreamReader(new FileInputStream(fileName)));
            String line = null;
            while((line = br.readLine()) != null){
                StringTokenizer st = new StringTokenizer(line, " ");
                while(st.hasMoreTokens()){
                    String temp = st.nextToken().toLowerCase();
                    if(wordMap.containsKey(temp)){
                        wordMap.put(temp, wordMap.get(temp)+1);
                    } else {
                        wordMap.put(temp, 1);
                    }
                }
            }
        } catch (FileNotFoundException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        } finally{
            try{
             if(br != null) 
              br.close();
             }catch(Exception ex){}
        }
        return wordMap;
    }
     
    public List<Entry<String, Integer>> sortByValue(Map<String, Integer> wordMap){
        Set<Entry<String, Integer>> set = wordMap.entrySet();
        List<Entry<String, Integer>> list = new ArrayList<Entry<String, Integer>>(set);
        Comparator<Map.Entry<String, Integer>> comparator = new Comparator<Map.Entry<String, Integer>>(){
         public int compare( Map.Entry<String, Integer> o1, Map.Entry<String, Integer> o2 ){
                return (o2.getValue()).compareTo( o1.getValue() );
            }
        };
        Collections.sort( list, comparator);
        return list;
    }
     
    public static void main(String a[]){
     CountDuplicateWords mdc = new CountDuplicateWords();
        Map<String, Integer> wordMap = mdc.getWordCount("C:/MyTestFile.txt");
        List<Entry<String, Integer>> list = mdc.sortByValue(wordMap);
        for(Map.Entry<String, Integer> entry:list){
            System.out.println(entry.getKey()+" ===>> "+entry.getValue());
        }
    }
}
you can download the above given MyTestFile.txt file and the CountDuplicateWords.java in Github.
Output:
Output_find max repeated words_javabynataraj

Reference Books:

Related Posts Plugin for WordPress, Blogger...
© javabynataraj.blogspot.com from 2009 - 2014. All rights reserved.