What? Whats that then? Well actually it’s an idea that Hazamonzo developed with Matt but unfortunately his blog post disappeared, so this is basically a re-write of the technique.
Well Metadata Injection is one of the single most powerful features of PDI. However it has a dark secret. Only *some* steps support it. A year ago that list was very small, thankfully it has grown recently due to concerted efforts by the dev team. But as PDI has 100s of steps, more being added weekly, there’s always going to be a scenario where the step you want to Inject doesnt support it.
By the way here is the list of those that do:
So what do you do? Well PDI just saves its transforms and jobs as either XML or data in a repository. We could hack the XML… Err, NO STOP RIGHT THERE. Don’t even consider it. Much better – use the API. The API is how spoon itself sets the metadata for a step.
So the steps are:
- Open the transformation file
- Find the step(s) we want to change
- Configure it accordingly
- Save the file
The code can be found in my samples repository: https://github.com/codek/pdi-samples
Secondly – when a step has multiple rows you are better off generating the whole object that configures that row – however in this simple case I’ve assumed there will always be 1 row in the UDJE step, and it’s that row we want to configure.
Finally there are LOADS and loads of helper functions you can use. If you’re building SQL you can use the following function to get a correcly quoted field:
theField = inputDatabaseMeta.quoteField(source_name);
How do I access the API?
var meta=new org.pentaho.di.trans.TransMeta( ktrFileName );
var UDJEStep = meta.findStep("User Defined Java Expression");
var UDJEStepMeta = UDJEStep.getStepMetaInterface();
Note that you can get all the steps if you want and loop around them, but “findStep” is an easier way of doing that. You will need both the Step object as well as the Meta object. (Exactly what these are is a question you can answer with google!)
How do I know how to configure a step?
At this point you must dig into the code. Thankfully this is very easy and not something to be scared about. In reality all PDI developers will benefit from understanding the basics of the underlying code.
- Checkout PDI from github in eclipse or netbeans or any other tool
- Know that all the step code lives in engine/src/org/pentaho/di/trans/steps
- Work out what your step is called. Usually this is easy, but in the “User Defined Java Expression” case the step is actually called “Janino” (Due to the underlying library which compiles the expression)
- Open the file called JaninoMeta.java
- Open it and look at all the getters and setters
So, lets take a look at JaninoMeta:
Note: In this case we have an setFormula/getFormula pair. They are an array because you can have multiple formulas in this step. The Array is a “JaninoMetaFunction” whatever that is. So; to get the function for the first row it is simply:
Now, what does JaninoMetaFunction look like then:
Well this is very common. It’s simply am object with a bunch of getters and setters which match up with the columns in the grid for the UDJE. So; To replace the expression we can use code like this:
savedobject.setFormula("some new expression here");
So; You’ve figured out the code, now all you need to do is save the KTR, and that is as simple as:
meta.writeXML( targetFile );
OK, did i say it already? Simple! Do not be afraid of the code, we’re only talking a few lines here.
Now; If you do find a step which cannot be injected natively, do please make sure there is a jira for it, and ultimately you can then move your code over to native injection as and when it comes in.
In all seriousness I don’t know of any other ETL tool that offers this feature – I’d love to be proven wrong, so if other tools have similar concepts then please let me know I’d like to look at them and see exactly how they do it.