So, short and sweet. How do you use submappings in a mapper or reducer?
Natively you can’t. It doesn’t work – but the reason it doesn’t work is simple – The Map/Reduce step in Pentaho doesn’t make the submapping ktr available to the job. It only publishes the top level job.
So the solution is to use a HDFS url for the name of the sub mapping transformation i.e.:
hdfs://hdfsserver:8020/user/rockstar/mysubtransformation.ktr
This however has side effects – namely spoon will hang rather a lot. So the only way to apply this is to hack the XML of your transformation. Yuck!
You could actually use any resolvable url. I think it would make sense to use HDFS, but make sure you put the ktr into the distributed cache so it’ll always be on the local node. BOOM!
Naturally there is now a jira but as we’re all going spark, I don’t see it being fixed too quickly 🙂
Of course; You can workaround the hang by using a variable, which points to the local file in your dev env. Additionally your parent job can copy the subtransformation into HDFS to automate the deployment and make things easier.