Skip to content

ENH: improve upload.sh to retry, extract from README.md into a script to automatically test blazegraph #16

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 21 commits into
base: master
Choose a base branch
from

Conversation

yarikoptic
Copy link
Member

Now you can run

./run_tests_under_blazegraph ./test_1.sh

and provide different test files... or replace the runner to start oxygraph instead.

note that I am now using an original .sparql file, not hand changed for curl invocation one. This way there is need to only modify the .sparql file and we should get rid of that all duplication in README.md and _curl.sparql

attn @djarecka

@yarikoptic
Copy link
Member Author

pushed more fixes etc. Now the query invocation looks like

 curl -X POST "${GRAPHDB_API_URL}" --data-binary '@queries/simple2_query.sparql' -H 'Accept: text/csv' -H "Content-Type: application/sparql-query" >| "$output"

so reading from file directly and getting CSV back (first I forgot to pass that). It then takes ~4 minutes with about 800% CPU load by blazegraph (java) process to answer. It produces CSV but different

instead of

study,ID,Age,dx,Gender,FIQ,PIQ,VIQ,tool,softwareLabel,federatedLabel,laterality,volume
ABIDE UCLA_1 Site,0051235,1.067E1,1,Male,1.32E2,1.22E2,1.3E2,http://purl.org/nidash/fsl#,Background (mm^3),,,1.1737349E7
ABIDE UCLA_1 Site,0051235,1.067E1,1,Male,1.32E2,1.22E2,1.3E2,http://purl.org/nidash/fsl#,Left-Accumbens-area (mm^3),http://purl.obolibrary.org/obo/UBERON_0001882,Left,690.0
ABIDE UCLA_1 Site,0051235,1.067E1,1,Male,1.32E2,1.22E2,1.3E2,http://purl.org/nidash/fsl#,Left-Amygdala (mm^3),http://purl.obolibrary.org/obo/UBERON_0001876,Left,1392.0

getting

study,ID,Age,dx,Gender,FIQ,PIQ,VIQ,tool,softwareLabel,federatedLabel,laterality,volume
,16070,18.31,Typically Developing Children,Female,92,-999,-999,https://surfer.nmr.mgh.harvard.edu/,Brain Segmentation Volume (mm^3),http://purl.obolibrary.org/obo/UBERON_0000955,,1049835.0
,21032,8.84,ADHD-Combined,Female,120,101,138,https://surfer.nmr.mgh.harvard.edu/,Brain Segmentation Volume (mm^3),http://purl.obolibrary.org/obo/UBERON_0000955,,1041116.0
,21042,9.42,ADHD-Combined,Male,81,72,94,https://surfer.nmr.mgh.harvard.edu/,Brain Segmentation Volume (mm^3),http://purl.obolibrary.org/obo/UBERON_0000955,,1046980.0

so for some reason I have lost "study", I got 48502 instead of 13199 result records.

If I cut/paste line from README as is and run under queries/ (since it dumps to ../queries), or run that bash -x simple2_query_curl.sparql -- I receive nothing besides the header!

So, the questions:

  • can anyone reproduce that queries/simple2_query_output.csv ?
  • how to change query or what is wrong which leads me to loose study in the output?

@yarikoptic
Copy link
Member Author

but I am confused -- it seems that e.g. none of the .ttl files has that sample site I supposed to see in the first result record:

❯ git grep 'ABIDE UCLA_1 Site' -- **/*.ttl
❯

so likely that output recorded in git is some echo from the past and is not expected to be obtained given now contained here data. is that so @dbkeator?

@yarikoptic
Copy link
Member Author

Similarly to

@djarecka - ping on a potential runner for the oxygraph.

@djarecka
Copy link
Member

ok, will come back to it this week. Last time it didn't finished the query before I had to close the laptop and go home

@djarecka
Copy link
Member

@yarikoptic - you're running ./run_tests_under_blazegraph ./test_1.sh in 4min on your laptop?

@djarecka
Copy link
Member

@yarikoptic - you're running ./run_tests_under_blazegraph ./test_1.sh in 4min on your laptop?

nevermind, after removing extra ttl files (that I merged for oxigraph), your example indeed run in 4-5min

@yarikoptic
Copy link
Member Author

for me

  • oxigraph produced (albeit quickly) no output for target query of test_1.sh. I have pushed the "runner" for it here - the run_tests_under_oxigraph. I could not figure out yet why it simply finds no results
  • I also have a query from @surchs which I pushed to seb-openneuro branch on top of this one.
    • there it took oxigraph IIRC 3 hours where it took blazegraph seconds for the query...

@surchs -- didn't you get some other graphdb running the test? do you mind sharing the runner for it too?

@surchs
Copy link

surchs commented Mar 15, 2024

@surchs -- didn't you get some other graphdb running the test? do you mind sharing the runner for it too?

Yo @yarikoptic: I heard you like graph store benchmarks, so I put one in my fork of your fork, so you can merge while you merge: yarikoptic#1 🎉

@djarecka
Copy link
Member

@yarikoptic - it might be that it doesn't have the data. I think that why I was merging the ttl files before uploading

upload.sh Outdated
ttl) ct=text/turtle;;
*) ct=text;; # fail it
esac
curl --silent -X POST -H "Content-Type: $ct" --data-binary "@$1" ${GRAPHDB_API_URL:-http://localhost:8889/bigdata/sparql}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yarikoptic - I had to change to http://localhost:7878/store?default for oxigraph to see any data

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

going to http://localhost:7878/store?default IIRC would immediately trigger download of something. web UI to send queries was on top URL IIRC.

@djarecka
Copy link
Member

djarecka commented Mar 15, 2024 via email

Co-authored-by: Dorota Jarecka <djarecka@gmail.com>
@tekrajchhetri
Copy link

tekrajchhetri commented Sep 13, 2024

@yarikoptic @satra I ran the test and the query is not running forever like @djarecka mentioned. Though in AWS it seems like that but I realize it's because the EC2 instance I had was not powerful enough. Running on my local mac, the query executed fine.

Test script bash:
image

Executing query on browser:

image

System config:

Chip: Apple M3 Pro
Total Number of Cores: 12 (6 performance and 6 efficiency)
RAM: 36 GB
OS: macOS 14.6.1

Tested Oxygraph version:
oxigraph 0.4.0-rc.1

@satra
Copy link
Contributor

satra commented Sep 13, 2024

thanks tek. that's pretty reasonable for that query. @yarikoptic - perhaps see if you can replicate on your machine using the same container as @tekrajchhetri ? or give @tekrajchhetri access to the machine to try.

@yarikoptic
Copy link
Member Author

yarikoptic commented Sep 13, 2024

thanks @tekrajchhetri ! that is great -- may be smth was fixed up in oxigraph! We should get back into business. Would be great to merge

edit: I will retry whenever get a breather ;-)

@tekrajchhetri
Copy link

tekrajchhetri commented Sep 13, 2024

@yarikoptic Probably because the latest update to Oxygraph repo was last week :)

@tekrajchhetri
Copy link

@yarikoptic @satra Seems that the longer time (earlier one) might be due to the initial bootstrapping of the database. I ran the query again multiple times and the execution time averages to 63.XXX seconds.

image image image image

Thank you.

@yarikoptic
Copy link
Member Author

FWIW, filed

@yarikoptic
Copy link
Member Author

indeed it ran in sensible time now and produced some output! (different from what is committed but it is expected). Now it is time to get those updated queries and more consistently ordered outputs!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants