Program to find max repeated or duplicate words from a text file
In this tutorial, you will learn How to Find Maximum Occurrence of Words or repeated words from given Text File.
In the earlier post we have gone through How to read a file using BufferedReader and Scanner.This would give you a basic idea to read a file through the Streams. And the related post to this current program best way to find repeated characters from a String also helpful to understand the maximum repeated words from the given text file. Here i am using BufferedReader and FileInputStream to read a file, if the file does not exists this throws an exception FileNotFoundException.
Here are the steps to write the program:
Output:
Reference Books:
In the earlier post we have gone through How to read a file using BufferedReader and Scanner.This would give you a basic idea to read a file through the Streams. And the related post to this current program best way to find repeated characters from a String also helpful to understand the maximum repeated words from the given text file. Here i am using BufferedReader and FileInputStream to read a file, if the file does not exists this throws an exception FileNotFoundException.
Here are the steps to write the program:
- Write a method getWordCount to count the number of words in a given text file.
- Create a Map with key value as String, Integer to Store words and its count.
- Create sorByValue method with the return type List(contains map objects) to sort by the map values.
- Write main method to read file and call the above methods.
import java.io.BufferedReader; import java.io.FileInputStream; import java.io.FileNotFoundException; import java.io.IOException; import java.io.InputStreamReader; import java.util.ArrayList; import java.util.Collections; import java.util.Comparator; import java.util.HashMap; import java.util.List; import java.util.Map; import java.util.Map.Entry; import java.util.Set; import java.util.StringTokenizer; /** * @author javabynataraj.blogspot.com */ public class CountDuplicateWords { public Map<String, Integer> getWordCount(String fileName){ BufferedReader br = null; Map<String, Integer> wordMap = new HashMap<String, Integer>(); try { br = new BufferedReader(new InputStreamReader(new FileInputStream(fileName))); String line = null; while((line = br.readLine()) != null){ StringTokenizer st = new StringTokenizer(line, " "); while(st.hasMoreTokens()){ String temp = st.nextToken().toLowerCase(); if(wordMap.containsKey(temp)){ wordMap.put(temp, wordMap.get(temp)+1); } else { wordMap.put(temp, 1); } } } } catch (FileNotFoundException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } finally{ try{ if(br != null) br.close(); }catch(Exception ex){} } return wordMap; } public List<Entry<String, Integer>> sortByValue(Map<String, Integer> wordMap){ Set<Entry<String, Integer>> set = wordMap.entrySet(); List<Entry<String, Integer>> list = new ArrayList<Entry<String, Integer>>(set); Comparator<Map.Entry<String, Integer>> comparator = new Comparator<Map.Entry<String, Integer>>(){ public int compare( Map.Entry<String, Integer> o1, Map.Entry<String, Integer> o2 ){ return (o2.getValue()).compareTo( o1.getValue() ); } }; Collections.sort( list, comparator); return list; } public static void main(String a[]){ CountDuplicateWords mdc = new CountDuplicateWords(); Map<String, Integer> wordMap = mdc.getWordCount("C:/MyTestFile.txt"); List<Entry<String, Integer>> list = mdc.sortByValue(wordMap); for(Map.Entry<String, Integer> entry:list){ System.out.println(entry.getKey()+" ===>> "+entry.getValue()); } } }you can download the above given MyTestFile.txt file and the CountDuplicateWords.java in Github.
Output:
Reference Books: