Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Here-in lies the problem for me:

    select * { wd:Q9682 (wdt:P25|wdt:P22)* ?p . ?p wdt:P25|wdt:P22 ?q } -
I am extremely motivated to learn how to use this: I have a deep desire to extract data from Wikipedia, and I'm fascinated by graph databases.

And yet, despite trying on several previous occasions, SPARQL has completely failed to stick in my brain.

This is partly my own failing: I'm confident that if I really dedicated myself to it I could get over this hump.

But it's also a sign that the learning curve here really is tremendously steep, which I think indicates a problem with the design of the technology.



I find it helps to translate the syntax into english:

> select *

Show all variables starting with ?

> wd:Q9682

Find item Q9682 (Queen Elizabeth 2)

> (wdt:P25|wdt:P22)*

Follow edges that are either P22 (father) or P25 (mother) zero-or more times

Everytime you follow one of those edges, add the new item to ?p. Keep following these edges until you can't anymore.

> ?p wdt:P25|wdt:P22 ?q

For every ?p follow a mother/father edge precisely once, call the item it points to ?q (if there is no such edge we get rid of the p)

The end result, is we have a list of rows containing pairs of (an ancestor of elizabeth, one of that ancestor's direct parents).

----

I feel like one of the reasons that sparql is confusing is because people use their intuitions from SQL which is wrong - since the underlying data model is different but the syntax looks vaugely sql-like which leads to misunderstandings.


Where do you end up with the translations from wdt:P25 to "mother"? That's the most incomprehensible part. It feels like I need a reverse dictionary lookup to write a single query.


I 100% agree that namespaces, urls and numeric Q ids add significantly to how complex wikidata sparql queries are, and generally make them incomprehensible. The editor at https://query.wikidata.org does have helpful tooltips though.

But honestly i think people would have a lot easier time if we had less indirection and just wrote "mother" instead of wdt:P25

What i actually do, is take the number, if it starts with a q go to wikidata.org/wiki/Q123 . If it starts with P go to https://wikidata.org/wiki/Property:P25

What it actually means in a technical sense:

Identifiers in sparql are urls (sort of similar to integer id fields in sql). wdt: is short for http://wikidata.org/prop/direct so wdt:P25 is http://wikidata.org/prop/direct/P25 wdt: means basically normalized but there are other prefixes if you need to access deprectated statements or modifiers on properties. Gory details at https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Fo...


Btw, if you are using https://query.wikidata.org, you can type: wdt:mother, press ctrl+space and it will suggest P25 relationship.


Yep. Not only this but in the sample "wd:Q9682" is a lie. wd is a namespace shortcut which is expended to an URI, and the prefix to URI mapping has to be defined as part of the query, otherwise it won't work. Notice how the sample use two of those prefixes (wd and wdt): data is segregated in different namespace that one have to search for and remember each time they want to make a query. And I mean remembering the prefix value, ie a partial URI, not the little cute prefix like wd that semweb sample always use.


The sparql endpoint that is typically used with wikidata has some namespaces implicitly predefined, including wdt and wd.

You do not need to declare the wd: prefix if you are using the endpoint at https://query.wikidata.org

I think its safe to assume in context that the newbie sparql user is not setting up their own sparql endpoint but using the official wikidata endpoint.


It’s part of the Wikidata data model.


It acutally looks pretty similar to Regex. But instead of matching strings of chars, we match paths on graphs?


Essentially given a big graph, sparql finds all the subgraphs that match the given constraints and project the captured variables into a table for every subgraph matched. (Or at least that's how i think about it, not sure if that's officially what it does)

The property path syntax ( https://www.w3.org/TR/sparql11-query/#propertypaths ) does look a lot like regex syntax, and the general triple pattern construct does kind of feel a bit DFA-ish.


A few days ago, the Wikidata Query Builder[0] was deployed. It provides a visual interface to generate simple SPARQL queries, and you can show the generated queries. Maybe this can help you in understanding how SPARQL patterns work?

[0] https://query.wikidata.org/querybuilder/


That does look like a big step forward.

It could really benefit from some linked examples on that page though - I stared at the interface for quite a while, unable to figure out how to use it for anything - then I dug around for an example link and it started to make sense to me.

https://query.wikidata.org/querybuilder/?query=%7B%22conditi...


Or use something like https://github.com/zverok/wikipedia_ql that uses Mediawiki API




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: