Thursday, December 2, 2010

How to check if network is the bottleneck in your hadoop cluster

We had an issue with our hadoop cluster that after a network upgrade the jobs on the cluster started running very slowly. We knew it had to do something with the network upgrade.

On further investigation we found that all the mapper in the jobs are running fine, all the performance degradation happened on reducers. This pretty much proved the point that network is the bottleneck because most mappers jobs were rack-local (and it requires very limited to almost no use of network) but the reducer phase uses network heavily for sorting and shuffling of data before it can apply the reduce function.

No comments:

Post a Comment