#Neo4j #London #Meetup

Last night I attended the Neo4j meetup, which wasnt far from this pretty spectacular building:


Anyway, I digress.

So the talk was all about knowledge graphs, and was presented by Petra Selma who is driving the direction and development of Cypher – the neo4j query language.

So, some very interesting points were made, here are my highlights, in no particular order!

  • Neo4j and Edinburgh university are working to define and lock down the semantics of Cypher – or rather graph query language. The aim is to produce a standard that all vendors actually adhere to – Unlike SQL where every dialect is different. This is a noble aim, however if graph tech does take off, I can’t see it happening!
  • It’s quite curious that Cypher queries a graph, yet returns a table. This struck me as odd from the very start but subsequently Petra pointed out that in the next version you do have the option to return a graph – and indeed to build chains of queries.  Interesting stuff. (Composition was it called?)
  • Another interesting point – Typically when querying your graph it’s not uncommon to find unexpected insights – the whole “you dont know what you dont know”.  It’s hard to see from the query syntax how that is encouraged but I guess you need to delve deep into it to see.
  • When scaling out Neo4j they use causal consistency – so even if writes occur on different boxes, they are guaranteed to occur in the correct order.
    • This is related to another point – Neo4j seems very focussed on OLTP.  Insert speed. Acid etc.  It’ll be interesting to see how (if) that can also translate to a more analytic tool (which is the way they’re going now they’re moving to a graph “platform”
    • It’s very operationally focussed. All the connectors are geared towards keeping the Neo graph up to date in real time – presumably so that analytics etc are always up to date.  In that sense it’s more like another part of your operational architecture. It’s not like a datalake/warehouse.
    • Obviously there’s connectors for all sorts of sources. Plus you can use kettle where there isn’t – they didn’t mention that though!
    • However, in pointing out that you’re trying to move away from silo’d data etc, you are of course,  creating another silo, albeit one that reads from multiple other sources.
  • next versions will have support for tenancy, more data types, etc.  Multiple graphs. etc.
  • Indexing is not what you think – typically when querying a graph you find a node to start, and then traverse from there. So indexing is all about finding that initial node.
  • A really good point I liked a lot – the best graphs are grown and enriched organically.  As you load more data, enrich your graph. It’s a cycle
    • Additionally you can use ML (machine learning) to add labels to your graph.  Then the enriched graph becomes input to your ML again, and round you go!
    • So, start simple, and build up.  Let the benefits come.

All in all very interesting. It seems a tool well worth playing with, and kettle makes this super easy of course with the existing connectors developed by Know BI.  So have a go and see what you can find.  The barriers to starting are incredibly low.

I’m particularly interested in seeing where the putting relationships as first class citizens leads us – but i’m also curious to see how that fits alongside properties and data storage within the graph.  I can see some interesting examples with clinical data, and indeed, some fun examples in the beer world!

If you went, what did you think? Strike up a discussion on twitter!