So, short and sweet. How do you use submappings in a mapper or reducer?
Natively you can’t. It doesn’t work – but the reason it doesn’t work is simple – The Map/Reduce step in Pentaho doesn’t make the submapping ktr available to the job. It only publishes the top level job.
So the solution is to use a HDFS url for the name of the sub mapping transformation i.e.:
This however has side effects – namely spoon will hang rather a lot. So the only way to apply this is to hack the XML of your transformation. Yuck!
You could actually use any resolvable url. I think it would make sense to use HDFS, but make sure you put the ktr into the distributed cache so it’ll always be on the local node. BOOM!
Naturally there is now a jira but as we’re all going spark, I don’t see it being fixed too quickly 🙂