Data formats review - Surface and surface-sampled data

Final time: Friday, April 22 @ 1pm EDT (UTC-4).

Please find full scheduling information, including a Zoom link with a pass code on the Nipy meeting calendar.

If you cannot access the calendar for some reason, the Zoom link is Launch Meeting - Zoom. Feel free to DM me for the pass code.


Following up on the latest meeting where we went over NIfTI, MINC, AFNI’s BRIK/HEAD and DICOM, there was a request to cover surface formats such as GIFTI and CIFTI-2.

Please fill in your availability: WhenIsGood: Data formats - Surface and surface-sampled data

I’d like to propose the following line up:

  1. Data structures and some simple formats - @neurolabusc
  2. GIFTI - @coalsont
  3. BMESH - @fangq
  4. MNE Source Time Course (STC) format - @larsoner
  5. CIFTI-2 - @coalsont

I’m happy to cover GIFTI and CIFTI-2 if I can’t get someone more knowledgeable. My perspective on these is more based on the problems of associating meshes and mesh-sampled data (which are inefficient to bundle in many cases).

2 Likes

Chris and I have a draft presentation available for viewing here. If anyone wishes to comment or modify this, contact me by email and I can provide editorial permissions.

2 Likes

I have also created a JavaScript read speed versus file size test.

Hello,

I am interested in joining this meeting as I work with several surface file formats including

  • GIFTI, CIFTI (here)
  • Using texture wrapping / UV mapping with neuroimaging data (Figure S6.3, S6.4, and Section 6.6.3 from here)
  • OBJ (seems that I am not allowed to post more than 2 links but you can check out: github > ofgulban/bvbabel/blob/main/examples/read_srf_export_obj.py)

I am not sure if I can contribute more than @neurolabusc 's draft presentation, but I would be very interested to follow the discussion. I have filled in the availability form, hope it is enough to be included.
Thanks,

1 Like

@ofgulbanI have sent you a message on discourse to allow you to have editorial access to the presentation. I am very impressed with your github bvlabel project. I am grateful that you are using a permissive license. You might want to consider fulfilling the potential of the dormant NiBabel PR216 and providing a unified method for interacting with BrainVoyager data.

I do have one request: could you include test validation images for these formats? Personally, I would include these as a Github submodule (e.g. the same way that dcm2niix has the dcm_qa validation submodules with testing data). This will help those of us who want to support BrainVoyager formats but do not have BrainVoyager licenses.

1 Like

Hi @neurolabusc , thanks for your interest and suggestions. I am a bit new to reading/writing binary formats. I would be happy to include test validation images. Might need some guidance or examples for how to do it better. Maybe, if it is not too much work, can you open an issue at bvbabel github page? So that we can continue this specific conversation there.

Issue generated.

1 Like

Hi, Josh Moore mentioned this thread to me, so I allow my self to chime in.

I mostly have experience on fluorescence microscopy images, and have also worked on meshes for epithelium models, in Python. I have started working on an implementation in zarr of the PLY format (only ascii for now). PLY is like an extensible OBJ.

Also maybe of interest in this topic meshio does a good job of unifying lots of formats in a single API.

Before trying for PLY, I looked into OBJ and VRML — the good point with OBJ is that it is very standard and not extensible, so a legal OBJ can be opened anywhere, this is not in general the case with PLY as it is permissive on the names of the mesh elements you can store. VRML is an old virtual reality XML standard, I haven’t looked much further.

Hope this helps,

Guillaume

Guillaume, I really like the binary PLY format, it has huge performance advantages versus classic OBJ, VTK, STL, VRML formats in terms of disk space and speed. Its extensible nature is very nice, but it has allowed some users to develop variations that are not supported by other tools. The flexibility to handle non-triangular meshes is required for applications that need it, but incurs a size/speed penalty for domains that only use triangular meshes and modern hardware (that only supports triangles). The draft presentation describes that our field only uses triangular meshes, and for that reason I did not describe PLY (even though I prefer it to OBJ and STL).

I do think zarr has nice attributes that can be leveraged internally. However, I do have some concerns with regards as an interchange format as it is hard to support outside of Python (though I note JavaScript implementations).

2 Likes

HI @neurolabusc you are absolutely right with the pros and cons of PLY. I did not read the draft so missed the discussion on that point, sorry.
As for zarr, you’re right in that it is a bit python focused, and the specification is still quite new. Yet, for microscopy data, it seems to be the common choice to store pixels data (see the ome-ngff tag on image.sc.

OK. I thought the Caterva format looks like an interesting zarr-class library. In general, blosc has a lot of useful ideas for our field, with features like data-swizzling really improving compression performance. I still think these formats are a bit too complex and language specific for use as a NIfTI-style interchange format, but show great promise for internal storage (in the same way that mrtrix mif files use strides and other features internally, but support NIfTI for interchange).

We will be having this call on Apr 22 @ 1pm EDT (5pm UTC). This has been posted to the nipy meetings calendar, and may be available at this link.

It’s veering off-topic but just for completeness I’ll add that the list of Zarr implementations has been expanding. You can find those that are cross-tested under GitHub - zarr-developers/zarr_implementations. There definitely are corners of the V2 spec that are difficult to support outside of Python, but those can be avoided and V3 is intended to remove many of them. Best, ~Josh

@glyg after our discussion I did update the JavaScript benchmark to include PLY format. I would be interested to hear optimizations for any of these formats, these readers are used for NiiVue so performance gains can help real users.

I optimized the reader to handle the special case where vertices only store position information as float32’s - this allows block reading. I also added an optimization for the most common form of storing indices. In the table below ply.ply is my optimized code path, while gen.ply uses the unoptimized general ply reading. The benefit of PLY is that it can store non-triangular meshes. However, the flexibility incurs a penalty in terms of file size and read performance. I do think PLY is an outstanding interchange format, and the performance hit is not huge.

As an aside, I also added the uncompressed GIfTI. An interesting feature of GIfTI is that uncompressed meshes are both slower and larger than compressed GIfTI. The penalty for the base64 encoding comes into play here.

While JavaScript is a lot slower than native LLVM code, the relative performance of the different formats is the same.

raw.mz3	Size	5898280	Time	13
ply.ply	Size	6226140	Time	59
gen.ply	Size	6226140	Time	85
gz.mz3	Size	3259141	Time	361
gz.gii	Size	4384750	Time	1693
raw.gii	Size	7866016	Time	1783

thanks again for the opportunity for me to present the JMesh formats last Friday.

As a quick follow up, I just submitted a PR to @neurolabusc’s mesh benchmark to include mesh loading speed comparisons for Python (python driver script). please see

Overall, the loading speed in Python shares a similar relative pattern as JavaScript and MATLAB as shown in the presentation, but generally speaking offers the best speed compared to JS and MATLAB – with the exception that

  • even the fastest json parser (yyjson) still can’t compete with Node/JS’s V8 json parser when parsing json encoded uncompressed raw data
  • binary STL parsing is quite fast with numpy-stl
  • OBJ is too slow to be shown

The benchmark raw data and plots can be found here

updated after @neurolabusc’s patch for mz3 and using pysimdjson as .json parser

Nice work @fanqq. I have issued a pull request that accelerates the mz3 by leveraging NumPy. I wonder if similar optimizations could be applied to some of the formats you have developed. I do agree that several of the formats perform very well with Python. The ability to leverage NumPy with its architecture tuned code is a real asset.

cool! for raw.min.json, I was able to get 10x speed up after swapping the json parser to pysimdjson. Both the spreadsheet and the plot above are updated after the two patches.

@neurolabusc , you have mentioned in the meeting that it might be possible to volunteer or suggest a topic for the next meetings. I would be interested to propose presenting and opening a discussion about some arguments on the surface visualizations from high resolution cortical imaging perspective. How should I proceed about this?

1 Like

@matthew.brett and @effigies have been coordinating these meetings. Sounds like a fun topic. Here is a talk on this topic I gave about a year ago when we were starting work on NiiVue and its shaders. Your own work with texture wrapped surfaces would be nice. I also think MatCaps provide a nice solution for many applications, and Surfice includes a few MatCap shaders.

2 Likes

Yeah, there’s no formal process here. Just pick a time range far enough out that people aren’t fully booked, announce here with a topic and a time poll. Then you can try to round up some presenters and put together a more complete agenda, finalize the time, and make it happen.

I’ve given @ofgulban full access to the nipy calendar. Let me know if I can do anything to support further discussions.

1 Like