Wednesday, November 17, 2010

Video discussion that talks about Vertica & Hadoop integration

Video discussion that talks about Vertica & Hadoop integration
http://vimeo.com/16863984

The video talk about the connector Vertica provides for Hadoop. As of Nov 2010, this connector had a lot of issues.

Following are the cons of using the connector

  • The connector is very very slow and alsmot unusable if you have more than couple GBs of data
  • If a Hadoop task that pushes data to Vertica using the connector fails half then Hadoop restarts the task from scratch but the Vertica connector is not aware of this hence it insert duplicate data 
  • Not sure how the connector pushes data to Vertica, if you try to push more than Vertica can handle then there is no graceful degradation in performance, your Vertica Cluster can pretty much go down. 

No comments:

Post a Comment