Last night I attended the Neo4j meetup, which wasnt far from this pretty spectacular building:
Anyway, I digress.
So the talk was all about knowledge graphs, and was presented by Petra Selma who is driving the direction and development of Cypher – the neo4j query language.
So, some very interesting points were made, here are my highlights, in no particular order!
- Neo4j and Edinburgh university are working to define and lock down the semantics of Cypher – or rather graph query language. The aim is to produce a standard that all vendors actually adhere to – Unlike SQL where every dialect is different. This is a noble aim, however if graph tech does take off, I can’t see it happening!
- It’s quite curious that Cypher queries a graph, yet returns a table. This struck me as odd from the very start but subsequently Petra pointed out that in the next version you do have the option to return a graph – and indeed to build chains of queries. Interesting stuff. (Composition was it called?)
- Another interesting point – Typically when querying your graph it’s not uncommon to find unexpected insights – the whole “you dont know what you dont know”. It’s hard to see from the query syntax how that is encouraged but I guess you need to delve deep into it to see.
- When scaling out Neo4j they use causal consistency – so even if writes occur on different boxes, they are guaranteed to occur in the correct order.
- This is related to another point – Neo4j seems very focussed on OLTP. Insert speed. Acid etc. It’ll be interesting to see how (if) that can also translate to a more analytic tool (which is the way they’re going now they’re moving to a graph “platform”
- It’s very operationally focussed. All the connectors are geared towards keeping the Neo graph up to date in real time – presumably so that analytics etc are always up to date. In that sense it’s more like another part of your operational architecture. It’s not like a datalake/warehouse.
- Obviously there’s connectors for all sorts of sources. Plus you can use kettle where there isn’t – they didn’t mention that though!
- However, in pointing out that you’re trying to move away from silo’d data etc, you are of course, creating another silo, albeit one that reads from multiple other sources.
- next versions will have support for tenancy, more data types, etc. Multiple graphs. etc.
- Indexing is not what you think – typically when querying a graph you find a node to start, and then traverse from there. So indexing is all about finding that initial node.
- A really good point I liked a lot – the best graphs are grown and enriched organically. As you load more data, enrich your graph. It’s a cycle
- Additionally you can use ML (machine learning) to add labels to your graph. Then the enriched graph becomes input to your ML again, and round you go!
- So, start simple, and build up. Let the benefits come.
All in all very interesting. It seems a tool well worth playing with, and kettle makes this super easy of course with the existing connectors developed by Know BI. So have a go and see what you can find. The barriers to starting are incredibly low.
I’m particularly interested in seeing where the putting relationships as first class citizens leads us – but i’m also curious to see how that fits alongside properties and data storage within the graph. I can see some interesting examples with clinical data, and indeed, some fun examples in the beer world!
If you went, what did you think? Strike up a discussion on twitter!