How to use sub-mappings in Pentaho map-reduce

So, short and sweet.  How do you use submappings in a mapper or reducer?

Natively you can’t. It doesn’t work – but the reason it doesn’t work is simple – The Map/Reduce step in Pentaho doesn’t make the submapping ktr available to the job. It only publishes the top level job.

So the solution is to use a HDFS url for the name of the sub mapping transformation i.e.:

hdfs://hdfsserver:8020/user/rockstar/mysubtransformation.ktr

This however has side effects – namely spoon will hang rather a lot. So the only way to apply this is to hack the XML of your transformation. Yuck!

You could actually use any resolvable url.  I think it would make sense to use HDFS, but make sure you put the ktr into the distributed cache so it’ll always be on the local node. BOOM!

Naturally there is now a jira but as we’re all going spark, I don’t see it being fixed too quickly 🙂

Advertisements

One thought on “How to use sub-mappings in Pentaho map-reduce

  1. Of course; You can workaround the hang by using a variable, which points to the local file in your dev env. Additionally your parent job can copy the subtransformation into HDFS to automate the deployment and make things easier.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s