Tuesday, November 16, 2010

How to get the name of the file being executed by the mapper in Hadoop Map reduce framework

Sometimes when we are processing data in hadoop, it helps to know the name of the file chunk being processed by the mapper. The default InputFormat will provide the Mapper with (key, value) pairs where the key is the byte offset into the file, and the value is a line of text.

To get the filename of the current input, use the following code in your mapper

FileSplit fileSplit = (FileSplit)context.getInputSplit();
String filename = fileSplit.getPath().getName();
System.out.println("File name "+filename);
System.out.println("Directory and File name"+fileSplit.getPath().toString());

No comments:

Post a Comment