Non-Native Metadata Injection

What? Whats that then?  Well actually it’s an idea that Hazamonzo developed with Matt but unfortunately his blog post disappeared, so this is basically a re-write of the technique.

Well Metadata Injection is one of the single most powerful features of PDI.  However it has a dark secret. Only *some* steps support it.  A year ago that list was very small, thankfully it has grown recently due to concerted efforts by the dev team. But as PDI has 100s of steps, more being added weekly, there’s always going to be a scenario where the step you want to Inject doesnt support it.

By the way here is the list of those that do:

http://wiki.pentaho.com/display/EAI/ETL+Metadata+Injection

So what do you do? Well PDI just saves its transforms and jobs as either XML or data in a repository.  We could hack the XML…  Err, NO STOP RIGHT THERE.  Don’t even consider it.  Much better – use the API.  The API is how spoon itself sets the metadata for a step.

So the steps are:

  1. Open the transformation file
  2. Find the step(s) we want to change
  3. Configure it accordingly
  4. Save the file

Simple!

The code can be found in my samples repository: https://github.com/codek/pdi-samples

As always there are lots of ways to skin a cat.   Firstly; (counter intuitively) You can use the auto-documentation step to load the metadata, but in my case as it’s a single line of code I do it in the javascript step.

Secondly – when a step has multiple rows you are better off generating the whole object that configures that row – however in this simple case I’ve assumed there will always be 1 row in the UDJE step, and it’s that row we want to configure.

Finally there are LOADS and loads of helper functions you can use.  If you’re building SQL you can use the following function to get a correcly quoted field:

theField = inputDatabaseMeta.quoteField(source_name);

How do I access the API?

A few lines of code in a Modified Javascript step will do it:

var meta=new org.pentaho.di.trans.TransMeta( ktrFileName );
var UDJEStep = meta.findStep("User Defined Java Expression");
var UDJEStepMeta = UDJEStep.getStepMetaInterface();

Note that you can get all the steps if you want and loop around them, but “findStep” is an easier way of doing that.  You will need both the Step object as well as the Meta object.  (Exactly what these are is a question you can answer with google!)

How do I know how to configure a step?

At this point you must dig into the code.  Thankfully this is very easy and not something to be scared about. In reality all PDI developers will benefit from understanding the basics of the underlying code.

  1. Checkout PDI from github in eclipse or netbeans or any other tool
  2. Know that all the step code lives in engine/src/org/pentaho/di/trans/steps
  3. Work out what your step is called. Usually this is easy, but in the “User Defined Java Expression” case the step is actually called “Janino” (Due to the underlying library which compiles the expression)
  4. Open the file called JaninoMeta.java
  5. Open it and look at all the getters and setters

So, lets take a look at JaninoMeta:

JaninoMeta

Note: In this case we have an setFormula/getFormula pair. They are an array because you can have multiple formulas in this step.  The Array is a “JaninoMetaFunction” whatever that is.  So; to get the function for the first row it is simply:

getFormula()[0]

Now, what does JaninoMetaFunction look like then:

JaninoMetaFunction

Well this is very common. It’s simply am object with a bunch of getters and setters which match up with the columns in the grid for the UDJE.  So; To replace the expression we can use code like this:

savedobject.setFormula("some new expression here");

So; You’ve figured out the code, now all you need to do is save the KTR, and that is as simple as:

meta.writeXML( targetFile );

OK, did i say it already? Simple!  Do not be afraid of the code, we’re only talking a few lines here.

Now; If you do find a step which cannot be injected natively, do please make sure there is a jira for it, and ultimately you can then move your code over to native injection as and when it comes in.

In all seriousness I don’t know of any other ETL tool that offers this feature – I’d love to be proven wrong, so if other tools have similar concepts then please let me know I’d like to look at them and see exactly how they do it.

Advertisements

11 thoughts on “Non-Native Metadata Injection

  1. Interesting. I was thinking about this, but stopped when I read this from the PDI docs (http://wiki.pentaho.com/display/EAI/Pentaho+Data+Integration+-+Java+API+Examples):

    Recommendation for upward compatibility: If you want to create your own Transformation dynamically (e.g. from meta-data), use the method of generating a XML-file (KTR) instead of using the API. The XML-files are compatibility from the first Version of Kettle until now. This is the same for Jobs.

    • That wiki page is pretty out of date but the api could change this is true. However pentaho now offer full support for the api and not xml hacking so I still prefer this way. I also find the api easier and better documented than the xml. Thanks for pointing that note out though!

    • Sadly not. The extended support is only to those with oem support package. However rather than explicit docs a full set of well commented examples are now available and I believe these are community but not 100% sure.

  2. That’s a great idea to dynamically create transformation. But what about the plugins in pentaho? Here I want to dynamically set mongodb output step in .ktr file by using modifed script step, but “var meta = new org.pentaho.di.trans.TransMeta( ktrFileName );” does not work for me. Because MongoDB Output is not among those original steps, could I use the API in MongoDB Output steps? How?

      • Thanks for your information. But I’m just a rookie in learning kettle.>_< Could you please guide me to find the community? website or blog, anything like that~ I tried to follow those developers on the Github, but there are no emails listed on their homepage, what a pity…

  3. Pingback: Who needs Kettle PDI Metadata Injection? | SterlingDataScience

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s