From tsh...@ Tue Dec 5 04:47:33 2017 From: tsh...@ (Tarek Sherif) Date: Tue, 5 Dec 2017 07:47:33 -0500 Subject: [Public WebGL] Platform and browser inconsistencies rendering to 3D textures Message-ID: Hi all, I've noticed some sharp inconsistencies in how rendering to 3D textures works across platforms and browsers. I've tested Chrome 62/Firefox 57 on Ubuntu 16.04/Quadro M1000 and Windows 10/GT 750m using the following scenes: 1. Render to TEXTURE_2D color attachment/TEXTURE_2D depth: https://tsherif.github.io/webgl2bugs/unknown/render-to-texture2D.html 2. Render to TEXTURE_3D color attachment/TEXTURE_2D depth: https://tsherif.github.io/webgl2bugs/unknown/render-to-texture3D.html 3. Render to TEXTURE_3D color attachment/TEXTURE_2D_ARRAY depth: https://tsherif.github.io/webgl2bugs/unknown/render-to-texture3D-array-depth.html The first works everywhere and demonstrates what the scene should look like: - "This is WebGL" texture used for offscreen draw - Gray bar is the clear color for the offscreen draw - Black bar is the clear color for the main draw The second works in both browsers on Ubuntu, but fails in both on Windows. Chrome reports FRAMEBUFFER_UNSUPPORTED as the framebuffer status. Firefox reports no errors, and seems to do the clear (gray), but doesn't draw. The third works in both browsers on Ubuntu and fixes things in Chrome on Windows. Firefox behaves the same as for the previous example. I was tipped off to trying the third setup by the following language from the OpenGL 4.6 spec (section 9.4.2): "If any framebuffer attachment is layered, all populated attachments must be layered. Additionally, all populated color attachments must be from textures of the same target (three-dimensional, one- or two-dimensional array, cube map, or cube map array textures)." However, neither the ES 3.0, nor the WebGL 2 spec mention any such restriction. Shannon Woods brought it to my attention that it does appear in section 9.4.2 of the ES 3.2 spec. So the first question would be: should this restriction be formally added to the WebGL 2 spec? Tarek Sherif http://tareksherif.net/ https://www.biodigital.com/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From kbr...@ Tue Dec 5 13:09:20 2017 From: kbr...@ (Ken Russell) Date: Tue, 5 Dec 2017 13:09:20 -0800 Subject: [Public WebGL] Platform and browser inconsistencies rendering to 3D textures In-Reply-To: References: Message-ID: Good catch, sounds like a corner case of the spec that needs to be cleared up. Would you mind filing this at https://github.com/KhronosGroup/WebGL/issues ? Please include your test cases in some persistent form rather than only linking to them. Thanks much. -Ken On Tue, Dec 5, 2017 at 4:47 AM, Tarek Sherif wrote: > Hi all, > > I've noticed some sharp inconsistencies in how rendering to 3D textures > works across platforms and browsers. I've tested Chrome 62/Firefox 57 on > Ubuntu 16.04/Quadro M1000 and Windows 10/GT 750m using the following scenes: > > 1. Render to TEXTURE_2D color attachment/TEXTURE_2D depth: h > ttps://tsherif.github.io/webgl2bugs/unknown/render-to-texture2D.html > > 2. Render to TEXTURE_3D color attachment/TEXTURE_2D depth: h > ttps://tsherif.github.io/webgl2bugs/unknown/render-to-texture3D.html > > 3. Render to TEXTURE_3D color attachment/TEXTURE_2D_ARRAY depth: > https://tsherif.github.io/webgl2bugs/unknown/render- > to-texture3D-array-depth.html > > > The first works everywhere and demonstrates what the scene should look > like: > > - "This is WebGL" texture used for offscreen draw > - Gray bar is the clear color for the offscreen draw > - Black bar is the clear color for the main draw > > The second works in both browsers on Ubuntu, but fails in both on Windows. > Chrome reports FRAMEBUFFER_UNSUPPORTED as the framebuffer status. Firefox > reports no errors, and seems to do the clear (gray), but doesn't draw. > > The third works in both browsers on Ubuntu and fixes things in Chrome on > Windows. Firefox behaves the same as for the previous example. > > I was tipped off to trying the third setup by the following language from > the OpenGL 4.6 spec (section 9.4.2): "If any framebuffer attachment is > layered, all populated attachments must be layered. Additionally, all > populated color attachments must be from textures of the same target > (three-dimensional, one- or two-dimensional array, cube map, or cube map > array textures)." > > However, neither the ES 3.0, nor the WebGL 2 spec mention any such > restriction. Shannon Woods brought it to my attention that it does appear > in section 9.4.2 of the ES 3.2 spec. > > So the first question would be: should this restriction be formally added > to the WebGL 2 spec? > > Tarek Sherif > http://tareksherif.net/ > https://www.biodigital.com/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pya...@ Sun Dec 10 01:53:20 2017 From: pya...@ (=?UTF-8?Q?Florian_B=C3=B6sch?=) Date: Sun, 10 Dec 2017 10:53:20 +0100 Subject: [Public WebGL] vxgi/vxao Message-ID: Nvidia quite a while ago presented the concept of vx* (gi/ao) and the concept in a nutshell is to calculate a scene voxelization, make a clipmap, and evaluate voxel cone tracing against that clipmap. This is a really good idea as it requires in essence no preprocessing of the scene, and with some tradeoff scales to very large scenes (and not just prepared/limited domains). Unfortunately it's difficult to implement yourself for two reasons: - Voxelizing a real-time scene on the GPU is kinda slow (even if you do various trickery with the geometry shader). - Casting into that voxelization is less than trivial as the most efficient traversal algorithms are difficult to implement, inefficient or impossible to express in a shader etc. VXGI is also vendor specific, which makes it unsuitable to where WebGL should be. ----- I think things don't have to be this way. It should not be a major obstacle for GPUs to provide primitives to make implementing VXGI/VXAO easy and as fast as the GPU can afford it. All that's required is a standardized API to feed a scene (without duplication if possible) into a voxel clipmap builder, and a standardized API to query that data structure. There are already some forrays into standardized data structure handling with sparse textures. Would it be expecting to much that GPU vendors get together and start designing a standard API to deal with one of the most flexible and appealing approaches to GI/AO? -------------- next part -------------- An HTML attachment was scrubbed... URL: From cwa...@ Mon Dec 11 09:38:39 2017 From: cwa...@ (Corentin Wallez) Date: Mon, 11 Dec 2017 12:38:39 -0500 Subject: [Public WebGL] vxgi/vxao In-Reply-To: References: Message-ID: The WebGL WG isn't the place where new features like this will be designed, especially if they impact hardware design. Also standardazing an algorithm like VXGI/AO isn't a thing. Sparse textures / buffers is different because it is exposing the hardware virtual memory features in a way that allows doing many different algorithms. Also VXGI/AO is extremely expensive and I know of only one game that ships it: "The Tomorrow Children". More efficient techniques that are still good enough have been published recently. One that I like is "Real-Time Global Illumination Using Precomputed Illuminance Composition with Chrominance Compression". On Sun, Dec 10, 2017 at 4:53 AM, Florian B?sch wrote: > Nvidia quite a while ago presented the concept of vx* (gi/ao) and the > concept in a nutshell is to calculate a scene voxelization, make a clipmap, > and evaluate voxel cone tracing against that clipmap. > > This is a really good idea as it requires in essence no preprocessing of > the scene, and with some tradeoff scales to very large scenes (and not just > prepared/limited domains). > > Unfortunately it's difficult to implement yourself for two reasons: > > - Voxelizing a real-time scene on the GPU is kinda slow (even if you > do various trickery with the geometry shader). > - Casting into that voxelization is less than trivial as the most > efficient traversal algorithms are difficult to implement, inefficient or > impossible to express in a shader etc. > > VXGI is also vendor specific, which makes it unsuitable to where WebGL > should be. > > ----- > > I think things don't have to be this way. It should not be a major > obstacle for GPUs to provide primitives to make implementing VXGI/VXAO easy > and as fast as the GPU can afford it. All that's required is a standardized > API to feed a scene (without duplication if possible) into a voxel clipmap > builder, and a standardized API to query that data structure. > > There are already some forrays into standardized data structure handling > with sparse textures. Would it be expecting to much that GPU vendors get > together and start designing a standard API to deal with one of the most > flexible and appealing approaches to GI/AO? > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pya...@ Mon Dec 11 10:31:40 2017 From: pya...@ (=?UTF-8?Q?Florian_B=C3=B6sch?=) Date: Mon, 11 Dec 2017 19:31:40 +0100 Subject: [Public WebGL] vxgi/vxao In-Reply-To: References: Message-ID: On Mon, Dec 11, 2017 at 6:38 PM, Corentin Wallez wrote: > Also standardazing an algorithm like VXGI/AO isn't a thing. > Why not? > Also VXGI/AO is extremely expensive and I know of only one game that ships > it: "The Tomorrow Children". > It is expensive, but GPUs get faster every year. More efficient techniques that are still good enough have been published > recently. One that I like is "Real-Time Global Illumination Using > Precomputed Illuminance Composition with Chrominance Compression". > I think you mean: "Real-Time Global Illumination Using Precomputed Illuminance Composition with Chrominance Compression". Like many algorithms in this category, they rely on an extremely heavy precomputation that might take considerable time (minutes, hours, days). It's not something you do in realtime or on-line at all. In addition common to most approaches that use some spherical harmonic approximation, they struggle with glossy reflections and can usually only express "slightly glossy". And while the algorithm is certainly more efficient then the somewhat "brute force" approach of actually sampling/tracing the surroundings, it does have these limitations that it'll only apply to mostly completely static scenes with few if any dynamic interactions in them, and that the important aspect of glossy reflection (which makes up a huge part of all materials) usually doesn't work right. --- You can characterize Global Illumination algorithms on a spectrum: 1. Prebake everything -> because obvious, we're trying not to do that 2. Prebake some things and use a crude transport approximation that brings with it a host of drawbacks -> we are currently here 3. Come up with a formulation that allows to query the geometry (or a proxy for it) -> we will be there in the future In the latter category of the querying/tracing the scene there are two approaches. The first is relatively classical raytracing where the actual scene geometry is intersected with rays. There's been people who built hardware acceleration for that. The drawback of that is that it only gives you perfectly shiny reflections. If you want glossy or diffuse, you'll end up doing pathtracing essentially, which introduces noise and it's usually too much noise to be pleasant at 120hz. The second approach is to come up with some kind of proxy structure that usually doesn't work well for perfectly shiny reflections, but works well across a range from diffuse to pretty glossy. In that latter category of proxy structures there are to my knowledge two distinct flavors: 1. Put the entire scene into a sparse voxel structure and its respective mipmap -> this has been demonstrated to work well, but it's obviously limited in the domain size. In addition, sparse voxel structures are also expensive to traverse. 2. Replace the mipmap with a clipmap and get rid of the sparsity. Of course this trades off precision the further away features are, but it does work for large domains. The one really not solved problem this has is, that it requires rasterizing in 3D. This is seriously expensive. Tracing into the clipmap is simple in concept, but it's a bit awkward to implement. --- I'm trying to illustrate that there is a convergence going on where everything is going to converge on "be able to query a more or less realtime presentation of the scene in realtime". All more optimized approaches bring with them a lot of drawbacks, and they're not going to be what we'll use 10-20 years from now. -------------- next part -------------- An HTML attachment was scrubbed... URL: From cwa...@ Mon Dec 11 10:38:26 2017 From: cwa...@ (Corentin Wallez) Date: Mon, 11 Dec 2017 13:38:26 -0500 Subject: [Public WebGL] vxgi/vxao In-Reply-To: References: Message-ID: WebGL ML to BCC. On Mon, Dec 11, 2017 at 1:31 PM, Florian B?sch wrote: > On Mon, Dec 11, 2017 at 6:38 PM, Corentin Wallez > wrote: > >> Also standardazing an algorithm like VXGI/AO isn't a thing. >> > Why not? > > Graphics APIs standardize hardware features, this is not a hardware feature but something that you build on top of hardware features. It would be ok to standardize building blocks that would make VXGI/AO more efficient though. Just not VXGI/AO itself. > Also VXGI/AO is extremely expensive and I know of only one game that ships >> it: "The Tomorrow Children". >> > It is expensive, but GPUs get faster every year. > > More efficient techniques that are still good enough have been published >> recently. One that I like is "Real-Time Global Illumination Using >> Precomputed Illuminance Composition with Chrominance Compression". >> > I think you mean: "Real-Time Global Illumination Using Precomputed > Illuminance Composition with Chrominance Compression". > > Like many algorithms in this category, they rely on an extremely heavy > precomputation that might take considerable time (minutes, hours, days). > It's not something you do in realtime or on-line at all. In addition common > to most approaches that use some spherical harmonic approximation, they > struggle with glossy reflections and can usually only express "slightly > glossy". And while the algorithm is certainly more efficient then the > somewhat "brute force" approach of actually sampling/tracing the > surroundings, it does have these limitations that it'll only apply to > mostly completely static scenes with few if any dynamic interactions in > them, and that the important aspect of glossy reflection (which makes up a > huge part of all materials) usually doesn't work right. > > --- > > You can characterize Global Illumination algorithms on a spectrum: > > 1. Prebake everything -> because obvious, we're trying not to do that > 2. Prebake some things and use a crude transport approximation that > brings with it a host of drawbacks -> we are currently here > 3. Come up with a formulation that allows to query the geometry (or a > proxy for it) -> we will be there in the future > > In the latter category of the querying/tracing the scene there are two > approaches. The first is relatively classical raytracing where the actual > scene geometry is intersected with rays. There's been people who built > hardware acceleration for that. The drawback of that is that it only gives > you perfectly shiny reflections. If you want glossy or diffuse, you'll end > up doing pathtracing essentially, which introduces noise and it's usually > too much noise to be pleasant at 120hz. The second approach is to come up > with some kind of proxy structure that usually doesn't work well for > perfectly shiny reflections, but works well across a range from diffuse to > pretty glossy. In that latter category of proxy structures there are to my > knowledge two distinct flavors: > > 1. Put the entire scene into a sparse voxel structure and its > respective mipmap -> this has been demonstrated to work well, but it's > obviously limited in the domain size. In addition, sparse voxel structures > are also expensive to traverse. > 2. Replace the mipmap with a clipmap and get rid of the sparsity. Of > course this trades off precision the further away features are, but it does > work for large domains. > > The one really not solved problem this has is, that it requires > rasterizing in 3D. This is seriously expensive. Tracing into the clipmap is > simple in concept, but it's a bit awkward to implement. > > --- > > I'm trying to illustrate that there is a convergence going on where > everything is going to converge on "be able to query a more or less > realtime presentation of the scene in realtime". All more optimized > approaches bring with them a lot of drawbacks, and they're not going to be > what we'll use 10-20 years from now. > The role of the WebGL WG isn't to expose new hardware features and be future looking. It's role is to provide a summary of existing GPU technologies for the Web. As such it shouldn't look into VXGI/AO. And for that matter, this is the same for the W3C "GPU for the Web" group. -------------- next part -------------- An HTML attachment was scrubbed... URL: From kai...@ Tue Dec 19 17:38:43 2017 From: kai...@ (Kai Ninomiya) Date: Wed, 20 Dec 2017 01:38:43 +0000 Subject: [Public WebGL] WEBGL_get_buffer_sub_data_async Message-ID: All, Our new proposal for WEBGL_get_buffer_sub_data_async is finally ready. Please take a look and send along your comments and suggestions. Feel free to request comment access if you want to comment on the doc itself. Note this is a design doc and not a spec, so it will hopefully be easier to read but may not be explicit about every edge case yet. https://docs.google.com/document/d/1f65cGlfLHbKLOuvRSqTvrakNi60Swk6GCyS54v1ImKo/edit?usp=sharing -Kai -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 4845 bytes Desc: S/MIME Cryptographic Signature URL: From khr...@ Wed Dec 20 10:57:54 2017 From: khr...@ (Mark Callow) Date: Wed, 20 Dec 2017 10:57:54 -0800 Subject: [Public WebGL] WEBGL_get_buffer_sub_data_async In-Reply-To: References: Message-ID: <2671E2AD-6F46-4B1D-9C06-84D55B05BDC8@callow.im> > On Dec 19, 2017, at 17:38, Kai Ninomiya wrote: > > All, > > Our new proposal for WEBGL_get_buffer_sub_data_async is finally ready. Please take a look and send along your comments and suggestions. Feel free to request comment access if you want to comment on the doc itself. > > Note this is a design doc and not a spec, so it will hopefully be easier to read but may not be explicit about every edge case yet. > https://docs.google.com/document/d/1f65cGlfLHbKLOuvRSqTvrakNi60Swk6GCyS54v1ImKo/edit?usp=sharing > This doc should mention the core reason for this extension: the inability of some WebGL implementations to support glMapBufferRange. And describe how that led to gl.getBufferSubData() and then this proposal. As far as I can see all the use cases listed would be solved if glMapBufferRange was supported. Regards -Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 528 bytes Desc: Message signed with OpenPGP URL: From kai...@ Wed Dec 20 11:06:57 2017 From: kai...@ (Kai Ninomiya) Date: Wed, 20 Dec 2017 19:06:57 +0000 Subject: [Public WebGL] WEBGL_get_buffer_sub_data_async In-Reply-To: <2671E2AD-6F46-4B1D-9C06-84D55B05BDC8@callow.im> References: <2671E2AD-6F46-4B1D-9C06-84D55B05BDC8@callow.im> Message-ID: Good point Mark, I'll add that. On Wed, Dec 20, 2017, 10:57 AM Mark Callow wrote: > > > On Dec 19, 2017, at 17:38, Kai Ninomiya wrote: > > All, > > Our new proposal for WEBGL_get_buffer_sub_data_async is finally ready. > Please take a look and send along your comments and suggestions. Feel free > to request comment access if you want to comment on the doc itself. > > Note this is a design doc and not a spec, so it will hopefully be easier > to read but may not be explicit about every edge case yet. > > https://docs.google.com/document/d/1f65cGlfLHbKLOuvRSqTvrakNi60Swk6GCyS54v1ImKo/edit?usp=sharing > > > This doc should mention the core reason for this extension: the inability > of some WebGL implementations to support glMapBufferRange. And describe how > that led to gl.getBufferSubData() and then this proposal. As far as I can > see all the use cases listed would be solved if glMapBufferRange was > supported. > > Regards > > > -Mark > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 4845 bytes Desc: S/MIME Cryptographic Signature URL: From jgi...@ Wed Dec 20 16:03:22 2017 From: jgi...@ (Jeff Gilbert) Date: Wed, 20 Dec 2017 16:03:22 -0800 Subject: [Public WebGL] WEBGL_get_buffer_sub_data_async In-Reply-To: References: <2671E2AD-6F46-4B1D-9C06-84D55B05BDC8@callow.im> Message-ID: With the mechanisms we already have in WebGL 2[1], we can support no-stall polled-async readback from the GPU. Even in the case of poorly written content, this cannot incur any worse of pipeline stalls than we already allow for in WebGL via non-PBO ReadPixels and getBufferSubData. Note also that checking for stall-less behavior is fairly (though not entirely) deterministic, since apps must explicitly poll/wait on a fence before accessing the potentially-in-flight data. This is what I am implementing in Firefox, since it applies to all implementations, regardless of whether the implementation remotes calls. The key to this is the understanding that buffers with usage=GL_*_READ are directly mappable client-side buffers, into which (primarily) copyBufferSubData and readPixels enqueue writes. After the writes are known to be complete (via FenceSync(GPU_COMMANDS_COMPLETE)), since these are client-side buffers, they may be immediately mapped and read from. I do think there is room for a more ergonomic helper for handling this behavior, though it's not that complicated for a library to implement it. There is room to investigate a solution to eliminating a copy. MapBufferRange does this, but the naive implementation does create garbage ArrayBuffer wrappers. Note however, that if you want to copy the data into some existing ArrayBuffer (like the wasm heap), getBufferSubData is already copy-optimal. There is only room for improvement if you want to process the data in-place, which may be able to save a copy with MapBufferRange or similar in both types of implementation. [1]: Since this is only available in WebGL 2, I have proposed extensions to expose these mechanisms from WebGL 2 to WebGL 1: https://github.com/KhronosGroup/WebGL/pull/2563 On Wed, Dec 20, 2017 at 11:06 AM, Kai Ninomiya wrote: > Good point Mark, I'll add that. > > > On Wed, Dec 20, 2017, 10:57 AM Mark Callow wrote: >> >> >> >> On Dec 19, 2017, at 17:38, Kai Ninomiya wrote: >> >> All, >> >> Our new proposal for WEBGL_get_buffer_sub_data_async is finally ready. >> Please take a look and send along your comments and suggestions. Feel free >> to request comment access if you want to comment on the doc itself. >> >> Note this is a design doc and not a spec, so it will hopefully be easier >> to read but may not be explicit about every edge case yet. >> >> https://docs.google.com/document/d/1f65cGlfLHbKLOuvRSqTvrakNi60Swk6GCyS54v1ImKo/edit?usp=sharing >> >> >> This doc should mention the core reason for this extension: the inability >> of some WebGL implementations to support glMapBufferRange. And describe how >> that led to gl.getBufferSubData() and then this proposal. As far as I can >> see all the use cases listed would be solved if glMapBufferRange was >> supported. >> >> Regards >> >> >> -Mark ----------------------------------------------------------- You are currently subscribed to public_webgl...@ To unsubscribe, send an email to majordomo...@ with the following command in the body of your email: unsubscribe public_webgl ----------------------------------------------------------- From kbr...@ Wed Dec 20 19:32:44 2017 From: kbr...@ (Ken Russell) Date: Wed, 20 Dec 2017 19:32:44 -0800 Subject: [Public WebGL] WEBGL_get_buffer_sub_data_async In-Reply-To: References: <2671E2AD-6F46-4B1D-9C06-84D55B05BDC8@callow.im> Message-ID: It's true that it is possible to optimize the current synchronous GetBufferSubData API so that in the ideal case it runs much more quickly. Jeff demonstrated on the working group's internal mailing list how this could work: 1) The browser maintains a CPU-side shadow copy of all buffers that were allocated with GL_*_READ usage. 2) The user calls FenceSync after any operation that modifies one of these buffers. For example, ReadPixels calls targeting a pixel buffer object (PBO), or draw calls performing transform feedback into one or more buffers. 3) The browser uses that FenceSync call, combined with the usage parameter of that buffer, as a hint that it should begin asynchronously polling for the completion of that fence. Once completed, it internally calls MapBuffer, memcpys the result into the shadow copy, and then unmaps the buffer. At that point it signals the user-visible fence as completed. 4) The user polls that fence until it's completed. At that point, the user calls GetBufferSubData, which memcpys from the shadow copy to user-visible memory without blocking. It's an excellent point that it's possible (at all) to make this synchronous, blocking API much faster; Jeff, thanks for showing that it is. The advantage of optimizing the current API is that carefully written code will get good performance while still following the existing OpenGL ES 3.0 APIs with no additions. There are however some pitfalls with this approach. A) The user doesn't demonstrate their intent to read back from the buffer until they call GetBufferSubData. In order to make that call fast, the entire contents of these GL_*_READ buffers has to be mirrored back to the CPU any time the buffer is modified and a FenceSync is inserted afterward. Depending on how the user allocates buffers and how much they read back from those buffers, this may significantly increase memory traffic, and slow down applications. B) A lot of tracking has to be added in order to invalidate the shadow copy if the user modifies it between their FenceSync and calling GetBufferSubData. Doing this will at most result in a warning as well as degraded performance. In Kai's extension proposal, it's an error to modify the buffer while an async readback is pending. C) Because the shadow copy is hidden in the WebGL implementation, it's not possible to bypass it and eliminate one copy. Kai's extension proposal actually supports this, because the asynchronous intent is expressed directly by the user, as is the destination buffer, in the form of a SharedArrayBuffer. Fundamentally, readback from the GPU needs to be asynchronous at some level in order to be efficient. I think I speak for the Chrome team in saying that we think it's best to express the asynchronous primitive directly, rather than try to optimize the existing synchronous primitive using asynchronous ones under the hood. We recognize that it'll add complexity to introduce new APIs and are concerned about this, too. Still, Kai's latest proposal is pretty minimal, and directly expresses the user's intent. On Wed, Dec 20, 2017 at 4:03 PM, Jeff Gilbert wrote: > > With the mechanisms we already have in WebGL 2[1], we can support > no-stall polled-async readback from the GPU. Even in the case of > poorly written content, this cannot incur any worse of pipeline stalls > than we already allow for in WebGL via non-PBO ReadPixels and > getBufferSubData. Note also that checking for stall-less behavior is > fairly (though not entirely) deterministic, since apps must explicitly > poll/wait on a fence before accessing the potentially-in-flight data. > This is what I am implementing in Firefox, since it applies to all > implementations, regardless of whether the implementation remotes > calls. > > The key to this is the understanding that buffers with usage=GL_*_READ > are directly mappable client-side buffers, into which (primarily) > copyBufferSubData and readPixels enqueue writes. After the writes are > known to be complete (via FenceSync(GPU_COMMANDS_COMPLETE)), since > these are client-side buffers, they may be immediately mapped and read > from. > > I do think there is room for a more ergonomic helper for handling this > behavior, though it's not that complicated for a library to implement > it. > > There is room to investigate a solution to eliminating a copy. > MapBufferRange does this, but the naive implementation does create > garbage ArrayBuffer wrappers. Note however, that if you want to copy > the data into some existing ArrayBuffer (like the wasm heap), > getBufferSubData is already copy-optimal. There is only room for > improvement if you want to process the data in-place, which may be > able to save a copy with MapBufferRange or similar in both types of > implementation. > > [1]: Since this is only available in WebGL 2, I have proposed > extensions to expose these mechanisms from WebGL 2 to WebGL 1: > https://github.com/KhronosGroup/WebGL/pull/2563 > > On Wed, Dec 20, 2017 at 11:06 AM, Kai Ninomiya wrote: > > Good point Mark, I'll add that. > > > > > > On Wed, Dec 20, 2017, 10:57 AM Mark Callow wrote: > >> > >> > >> > >> On Dec 19, 2017, at 17:38, Kai Ninomiya wrote: > >> > >> All, > >> > >> Our new proposal for WEBGL_get_buffer_sub_data_async is finally ready. > >> Please take a look and send along your comments and suggestions. Feel > free > >> to request comment access if you want to comment on the doc itself. > >> > >> Note this is a design doc and not a spec, so it will hopefully be easier > >> to read but may not be explicit about every edge case yet. > >> > >> https://docs.google.com/document/d/1f65cGlfLHbKLOuvRSqTvrakNi60Sw > k6GCyS54v1ImKo/edit?usp=sharing > >> > >> > >> This doc should mention the core reason for this extension: the > inability > >> of some WebGL implementations to support glMapBufferRange. And describe > how > >> that led to gl.getBufferSubData() and then this proposal. As far as I > can > >> see all the use cases listed would be solved if glMapBufferRange was > >> supported. > >> > >> Regards > >> > >> > >> -Mark > > ----------------------------------------------------------- > You are currently subscribed to public_webgl...@ > To unsubscribe, send an email to majordomo...@ with > the following command in the body of your email: > unsubscribe public_webgl > ----------------------------------------------------------- > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jgi...@ Wed Dec 20 20:21:17 2017 From: jgi...@ (Jeff Gilbert) Date: Wed, 20 Dec 2017 20:21:17 -0800 Subject: [Public WebGL] WEBGL_get_buffer_sub_data_async In-Reply-To: References: <2671E2AD-6F46-4B1D-9C06-84D55B05BDC8@callow.im> Message-ID: A) An app demonstrates intent by successfully polling any newer-than-enqueued-write fence, which indicates that the app may want to read back from any now-known-completed write. This only adds memory traffic if the user dispatches copies they later realize they can cancel. This would be uncommon, but if it showed up in the wild, BufferData(null) would serve as an invalidation hint, allowing for cancellation. I don't think I would not bother with adding this invalidation unless someone showed up with a symptomatic application. It's also just good practice to use a rotating queue of readback buffers, to allow for pipelining. (that or 'dropping frames' of readback) B) It's really not a lot of tracking. The local copy is either valid (past a previous fence) or pending (not yet hit new enough fence). The patch(s) that tracks and warns about misuse in Firefox is pretty simple: https://bugzilla.mozilla.org/show_bug.cgi?id=1425488 It's an extra u64 for the context, each fence object, and each buffer. Implementing this in a client-server arch just means adding a refcounted SharedArrayBuffer to each buffer, and having the fence manager (similar to what you must already have for query management) handle map+memcpy when a buffer becomes up-to-date with respect to now-past fences. C) This only applies to a portion of the proposal marked as optional and not blocking discussion of the rest of the proposal. Reviving MapBufferRange also satisfies this, as well as giving us reliable spec language for free, and exposing it in a form largely familiar to existing graphics developers. Our underlying APIs already support all this, so I would really prefer to stick close to our parent specs. On Wed, Dec 20, 2017 at 7:32 PM, Ken Russell wrote: > It's true that it is possible to optimize the current synchronous > GetBufferSubData API so that in the ideal case it runs much more quickly. > Jeff demonstrated on the working group's internal mailing list how this > could work: > > 1) The browser maintains a CPU-side shadow copy of all buffers that were > allocated with GL_*_READ usage. > > 2) The user calls FenceSync after any operation that modifies one of these > buffers. For example, ReadPixels calls targeting a pixel buffer object > (PBO), or draw calls performing transform feedback into one or more buffers. > > 3) The browser uses that FenceSync call, combined with the usage parameter > of that buffer, as a hint that it should begin asynchronously polling for > the completion of that fence. Once completed, it internally calls MapBuffer, > memcpys the result into the shadow copy, and then unmaps the buffer. At that > point it signals the user-visible fence as completed. > > 4) The user polls that fence until it's completed. At that point, the user > calls GetBufferSubData, which memcpys from the shadow copy to user-visible > memory without blocking. > > It's an excellent point that it's possible (at all) to make this > synchronous, blocking API much faster; Jeff, thanks for showing that it is. > The advantage of optimizing the current API is that carefully written code > will get good performance while still following the existing OpenGL ES 3.0 > APIs with no additions. > > There are however some pitfalls with this approach. > > A) The user doesn't demonstrate their intent to read back from the buffer > until they call GetBufferSubData. In order to make that call fast, the > entire contents of these GL_*_READ buffers has to be mirrored back to the > CPU any time the buffer is modified and a FenceSync is inserted afterward. > Depending on how the user allocates buffers and how much they read back from > those buffers, this may significantly increase memory traffic, and slow down > applications. > > B) A lot of tracking has to be added in order to invalidate the shadow copy > if the user modifies it between their FenceSync and calling > GetBufferSubData. Doing this will at most result in a warning as well as > degraded performance. In Kai's extension proposal, it's an error to modify > the buffer while an async readback is pending. > > C) Because the shadow copy is hidden in the WebGL implementation, it's not > possible to bypass it and eliminate one copy. Kai's extension proposal > actually supports this, because the asynchronous intent is expressed > directly by the user, as is the destination buffer, in the form of a > SharedArrayBuffer. > > Fundamentally, readback from the GPU needs to be asynchronous at some level > in order to be efficient. I think I speak for the Chrome team in saying that > we think it's best to express the asynchronous primitive directly, rather > than try to optimize the existing synchronous primitive using asynchronous > ones under the hood. We recognize that it'll add complexity to introduce new > APIs and are concerned about this, too. Still, Kai's latest proposal is > pretty minimal, and directly expresses the user's intent. > > > > On Wed, Dec 20, 2017 at 4:03 PM, Jeff Gilbert wrote: >> >> >> With the mechanisms we already have in WebGL 2[1], we can support >> no-stall polled-async readback from the GPU. Even in the case of >> poorly written content, this cannot incur any worse of pipeline stalls >> than we already allow for in WebGL via non-PBO ReadPixels and >> getBufferSubData. Note also that checking for stall-less behavior is >> fairly (though not entirely) deterministic, since apps must explicitly >> poll/wait on a fence before accessing the potentially-in-flight data. >> This is what I am implementing in Firefox, since it applies to all >> implementations, regardless of whether the implementation remotes >> calls. >> >> The key to this is the understanding that buffers with usage=GL_*_READ >> are directly mappable client-side buffers, into which (primarily) >> copyBufferSubData and readPixels enqueue writes. After the writes are >> known to be complete (via FenceSync(GPU_COMMANDS_COMPLETE)), since >> these are client-side buffers, they may be immediately mapped and read >> from. >> >> I do think there is room for a more ergonomic helper for handling this >> behavior, though it's not that complicated for a library to implement >> it. >> >> There is room to investigate a solution to eliminating a copy. >> MapBufferRange does this, but the naive implementation does create >> garbage ArrayBuffer wrappers. Note however, that if you want to copy >> the data into some existing ArrayBuffer (like the wasm heap), >> getBufferSubData is already copy-optimal. There is only room for >> improvement if you want to process the data in-place, which may be >> able to save a copy with MapBufferRange or similar in both types of >> implementation. >> >> [1]: Since this is only available in WebGL 2, I have proposed >> extensions to expose these mechanisms from WebGL 2 to WebGL 1: >> https://github.com/KhronosGroup/WebGL/pull/2563 >> >> On Wed, Dec 20, 2017 at 11:06 AM, Kai Ninomiya wrote: >> > Good point Mark, I'll add that. >> > >> > >> > On Wed, Dec 20, 2017, 10:57 AM Mark Callow wrote: >> >> >> >> >> >> >> >> On Dec 19, 2017, at 17:38, Kai Ninomiya wrote: >> >> >> >> All, >> >> >> >> Our new proposal for WEBGL_get_buffer_sub_data_async is finally ready. >> >> Please take a look and send along your comments and suggestions. Feel >> >> free >> >> to request comment access if you want to comment on the doc itself. >> >> >> >> Note this is a design doc and not a spec, so it will hopefully be >> >> easier >> >> to read but may not be explicit about every edge case yet. >> >> >> >> >> >> https://docs.google.com/document/d/1f65cGlfLHbKLOuvRSqTvrakNi60Swk6GCyS54v1ImKo/edit?usp=sharing >> >> >> >> >> >> This doc should mention the core reason for this extension: the >> >> inability >> >> of some WebGL implementations to support glMapBufferRange. And describe >> >> how >> >> that led to gl.getBufferSubData() and then this proposal. As far as I >> >> can >> >> see all the use cases listed would be solved if glMapBufferRange was >> >> supported. >> >> >> >> Regards >> >> >> >> >> >> -Mark >> >> ----------------------------------------------------------- >> You are currently subscribed to public_webgl...@ >> To unsubscribe, send an email to majordomo...@ with >> the following command in the body of your email: >> unsubscribe public_webgl >> ----------------------------------------------------------- >> > ----------------------------------------------------------- You are currently subscribed to public_webgl...@ To unsubscribe, send an email to majordomo...@ with the following command in the body of your email: unsubscribe public_webgl ----------------------------------------------------------- From kai...@ Thu Dec 21 14:07:00 2017 From: kai...@ (Kai Ninomiya) Date: Thu, 21 Dec 2017 22:07:00 +0000 Subject: [Public WebGL] WEBGL_get_buffer_sub_data_async In-Reply-To: References: <2671E2AD-6F46-4B1D-9C06-84D55B05BDC8@callow.im> Message-ID: In the WebGL working group meeting today, we were able to agree to move forward on prototyping Jeff's proposal. Some of the performance concerns have been tentatively addressed: * We should be able to eventually eliminate a memcpy by, in (a) multi-process browsers, using SharedArrayBuffers backed by IPC shared memory, and in (b) single-process browsers, using SharedArrayBuffers backed by glMapBuffer memory. It might essentially be exposed to WebGL like a read-only MapBuffer operation, maybe for GL_*_READ buffers only. Ideally in both cases, the SharedArrayBuffer would be read-only, which is not a concept that currently exists. * The WEBGL_promises proposal can still be used in the same way as before, with the resulting Sync objects. On Wed, Dec 20, 2017 at 8:21 PM Jeff Gilbert wrote: > A) An app demonstrates intent by successfully polling any > newer-than-enqueued-write fence, which indicates that the app may want > to read back from any now-known-completed write. This only adds memory > traffic if the user dispatches copies they later realize they can > cancel. This would be uncommon, but if it showed up in the wild, > BufferData(null) would serve as an invalidation hint, allowing for > cancellation. I don't think I would not bother with adding this > invalidation unless someone showed up with a symptomatic application. > It's also just good practice to use a rotating queue of readback > buffers, to allow for pipelining. (that or 'dropping frames' of > readback) > > B) It's really not a lot of tracking. The local copy is either valid > (past a previous fence) or pending (not yet hit new enough fence). The > patch(s) that tracks and warns about misuse in Firefox is pretty > simple: https://bugzilla.mozilla.org/show_bug.cgi?id=1425488 It's an > extra u64 for the context, each fence object, and each buffer. > Implementing this in a client-server arch just means adding a > refcounted SharedArrayBuffer to each buffer, and having the fence > manager (similar to what you must already have for query management) > handle map+memcpy when a buffer becomes up-to-date with respect to > now-past fences. > > C) This only applies to a portion of the proposal marked as optional > and not blocking discussion of the rest of the proposal. Reviving > MapBufferRange also satisfies this, as well as giving us reliable spec > language for free, and exposing it in a form largely familiar to > existing graphics developers. > > Our underlying APIs already support all this, so I would really prefer > to stick close to our parent specs. > > On Wed, Dec 20, 2017 at 7:32 PM, Ken Russell wrote: > > It's true that it is possible to optimize the current synchronous > > GetBufferSubData API so that in the ideal case it runs much more quickly. > > Jeff demonstrated on the working group's internal mailing list how this > > could work: > > > > 1) The browser maintains a CPU-side shadow copy of all buffers that were > > allocated with GL_*_READ usage. > > > > 2) The user calls FenceSync after any operation that modifies one of > these > > buffers. For example, ReadPixels calls targeting a pixel buffer object > > (PBO), or draw calls performing transform feedback into one or more > buffers. > > > > 3) The browser uses that FenceSync call, combined with the usage > parameter > > of that buffer, as a hint that it should begin asynchronously polling for > > the completion of that fence. Once completed, it internally calls > MapBuffer, > > memcpys the result into the shadow copy, and then unmaps the buffer. At > that > > point it signals the user-visible fence as completed. > > > > 4) The user polls that fence until it's completed. At that point, the > user > > calls GetBufferSubData, which memcpys from the shadow copy to > user-visible > > memory without blocking. > > > > It's an excellent point that it's possible (at all) to make this > > synchronous, blocking API much faster; Jeff, thanks for showing that it > is. > > The advantage of optimizing the current API is that carefully written > code > > will get good performance while still following the existing OpenGL ES > 3.0 > > APIs with no additions. > > > > There are however some pitfalls with this approach. > > > > A) The user doesn't demonstrate their intent to read back from the buffer > > until they call GetBufferSubData. In order to make that call fast, the > > entire contents of these GL_*_READ buffers has to be mirrored back to the > > CPU any time the buffer is modified and a FenceSync is inserted > afterward. > > Depending on how the user allocates buffers and how much they read back > from > > those buffers, this may significantly increase memory traffic, and slow > down > > applications. > > > > B) A lot of tracking has to be added in order to invalidate the shadow > copy > > if the user modifies it between their FenceSync and calling > > GetBufferSubData. Doing this will at most result in a warning as well as > > degraded performance. In Kai's extension proposal, it's an error to > modify > > the buffer while an async readback is pending. > > > > C) Because the shadow copy is hidden in the WebGL implementation, it's > not > > possible to bypass it and eliminate one copy. Kai's extension proposal > > actually supports this, because the asynchronous intent is expressed > > directly by the user, as is the destination buffer, in the form of a > > SharedArrayBuffer. > > > > Fundamentally, readback from the GPU needs to be asynchronous at some > level > > in order to be efficient. I think I speak for the Chrome team in saying > that > > we think it's best to express the asynchronous primitive directly, rather > > than try to optimize the existing synchronous primitive using > asynchronous > > ones under the hood. We recognize that it'll add complexity to introduce > new > > APIs and are concerned about this, too. Still, Kai's latest proposal is > > pretty minimal, and directly expresses the user's intent. > > > > > > > > On Wed, Dec 20, 2017 at 4:03 PM, Jeff Gilbert > wrote: > >> > >> > >> With the mechanisms we already have in WebGL 2[1], we can support > >> no-stall polled-async readback from the GPU. Even in the case of > >> poorly written content, this cannot incur any worse of pipeline stalls > >> than we already allow for in WebGL via non-PBO ReadPixels and > >> getBufferSubData. Note also that checking for stall-less behavior is > >> fairly (though not entirely) deterministic, since apps must explicitly > >> poll/wait on a fence before accessing the potentially-in-flight data. > >> This is what I am implementing in Firefox, since it applies to all > >> implementations, regardless of whether the implementation remotes > >> calls. > >> > >> The key to this is the understanding that buffers with usage=GL_*_READ > >> are directly mappable client-side buffers, into which (primarily) > >> copyBufferSubData and readPixels enqueue writes. After the writes are > >> known to be complete (via FenceSync(GPU_COMMANDS_COMPLETE)), since > >> these are client-side buffers, they may be immediately mapped and read > >> from. > >> > >> I do think there is room for a more ergonomic helper for handling this > >> behavior, though it's not that complicated for a library to implement > >> it. > >> > >> There is room to investigate a solution to eliminating a copy. > >> MapBufferRange does this, but the naive implementation does create > >> garbage ArrayBuffer wrappers. Note however, that if you want to copy > >> the data into some existing ArrayBuffer (like the wasm heap), > >> getBufferSubData is already copy-optimal. There is only room for > >> improvement if you want to process the data in-place, which may be > >> able to save a copy with MapBufferRange or similar in both types of > >> implementation. > >> > >> [1]: Since this is only available in WebGL 2, I have proposed > >> extensions to expose these mechanisms from WebGL 2 to WebGL 1: > >> https://github.com/KhronosGroup/WebGL/pull/2563 > >> > >> On Wed, Dec 20, 2017 at 11:06 AM, Kai Ninomiya > wrote: > >> > Good point Mark, I'll add that. > >> > > >> > > >> > On Wed, Dec 20, 2017, 10:57 AM Mark Callow wrote: > >> >> > >> >> > >> >> > >> >> On Dec 19, 2017, at 17:38, Kai Ninomiya wrote: > >> >> > >> >> All, > >> >> > >> >> Our new proposal for WEBGL_get_buffer_sub_data_async is finally > ready. > >> >> Please take a look and send along your comments and suggestions. Feel > >> >> free > >> >> to request comment access if you want to comment on the doc itself. > >> >> > >> >> Note this is a design doc and not a spec, so it will hopefully be > >> >> easier > >> >> to read but may not be explicit about every edge case yet. > >> >> > >> >> > >> >> > https://docs.google.com/document/d/1f65cGlfLHbKLOuvRSqTvrakNi60Swk6GCyS54v1ImKo/edit?usp=sharing > >> >> > >> >> > >> >> This doc should mention the core reason for this extension: the > >> >> inability > >> >> of some WebGL implementations to support glMapBufferRange. And > describe > >> >> how > >> >> that led to gl.getBufferSubData() and then this proposal. As far as I > >> >> can > >> >> see all the use cases listed would be solved if glMapBufferRange was > >> >> supported. > >> >> > >> >> Regards > >> >> > >> >> > >> >> -Mark > >> > >> ----------------------------------------------------------- > >> You are currently subscribed to public_webgl...@ > >> To unsubscribe, send an email to majordomo...@ with > >> the following command in the body of your email: > >> unsubscribe public_webgl > >> ----------------------------------------------------------- > >> > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 4845 bytes Desc: S/MIME Cryptographic Signature URL: From pya...@ Fri Dec 22 04:30:59 2017 From: pya...@ (=?UTF-8?Q?Florian_B=C3=B6sch?=) Date: Fri, 22 Dec 2017 13:30:59 +0100 Subject: [Public WebGL] WEBGL_get_buffer_sub_data_async In-Reply-To: References: <2671E2AD-6F46-4B1D-9C06-84D55B05BDC8@callow.im> Message-ID: On Thu, Dec 21, 2017 at 11:07 PM, Kai Ninomiya wrote: > > * We should be able to eventually eliminate a memcpy by, in (a) > multi-process browsers, using SharedArrayBuffers backed by IPC shared > memory, and in (b) single-process browsers, using SharedArrayBuffers backed > by glMapBuffer memory. It might essentially be exposed to WebGL like a > read-only MapBuffer operation, maybe for GL_*_READ buffers only. Ideally in > both cases, the SharedArrayBuffer would be read-only, which is not a > concept that currently exists. > Afaik you cannot simultaneously create IPC shared memory and have it be the memory region mapped by glMapBuffer. -------------- next part -------------- An HTML attachment was scrubbed... URL: From kai...@ Fri Dec 22 10:49:58 2017 From: kai...@ (Kai Ninomiya) Date: Fri, 22 Dec 2017 18:49:58 +0000 Subject: [Public WebGL] WEBGL_get_buffer_sub_data_async In-Reply-To: References: <2671E2AD-6F46-4B1D-9C06-84D55B05BDC8@callow.im> Message-ID: Sorry, I think I spoke inaccurately. To clarify: Without special SABs, data is fetched the usual way with getBufferSubData, which does a memcpy from the the CPU-side shadow copy of the readback buffer into the ArrayBuffer. In multi-process, the shadow copy would be allocated in IPC shared memory. (2 copies for both single- and multi-process.) With the special mapped-buffer-SAB, the CPU-side shadow copy can actually be a SharedArrayBuffer. That SAB is fetched by gl.MapBuffer/Range. (1 copy for both single- and multi-process.) Note, the SAB is persistent, so it doesn't generate garbage. ArrayBufferView garbage might be generated in some cases by a MapBufferRange, but I think could be avoided with a MapBuffer. Finally, single-process browsers might be able to back the mapped-buffer-SAB with an actual glMapBufferRange pointer. (0 copies for single-process.) However, it's not clear yet how unmapping would behave with SABs - SABs cannot be neutered and their backing store cannot be swapped out. On Fri, Dec 22, 2017 at 4:31 AM Florian B?sch wrote: > On Thu, Dec 21, 2017 at 11:07 PM, Kai Ninomiya wrote: >> >> * We should be able to eventually eliminate a memcpy by, in (a) >> multi-process browsers, using SharedArrayBuffers backed by IPC shared >> memory, and in (b) single-process browsers, using SharedArrayBuffers backed >> by glMapBuffer memory. It might essentially be exposed to WebGL like a >> read-only MapBuffer operation, maybe for GL_*_READ buffers only. Ideally in >> both cases, the SharedArrayBuffer would be read-only, which is not a >> concept that currently exists. >> > > Afaik you cannot simultaneously create IPC shared memory and have it be > the memory region mapped by glMapBuffer. > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 4845 bytes Desc: S/MIME Cryptographic Signature URL: From pya...@ Sun Dec 24 02:38:52 2017 From: pya...@ (=?UTF-8?Q?Florian_B=C3=B6sch?=) Date: Sun, 24 Dec 2017 11:38:52 +0100 Subject: [Public WebGL] WEBGL_get_buffer_sub_data_async In-Reply-To: References: <2671E2AD-6F46-4B1D-9C06-84D55B05BDC8@callow.im> Message-ID: glMapBufferRange returns a memory pointer you can treat as usual memory into the buffer. IPC Memory sharing in any of its flavors requires you to use some functionality to specially allocate that memory (mmat, shmat, etc.). That means that you can never have a mapped buffer range and an IPC shared memory pointing to the same location. There are some workarounds you can try, but they will not work (no shared memory is established, some of the functions crap out with an error code, or segfault). I don't see fundamentally why it shouldn't work, because all of these just use the memory manager to do their bidding. It just seems to be the case that each implementor of the respective functions never thought of a case where they might be used in conjunction. On Fri, Dec 22, 2017 at 7:49 PM, Kai Ninomiya wrote: > Sorry, I think I spoke inaccurately. To clarify: > > Without special SABs, data is fetched the usual way with getBufferSubData, > which does a memcpy from the the CPU-side shadow copy of the readback > buffer into the ArrayBuffer. In multi-process, the shadow copy would be > allocated in IPC shared memory. (2 copies for both single- and > multi-process.) > > With the special mapped-buffer-SAB, the CPU-side shadow copy can actually > be a SharedArrayBuffer. That SAB is fetched by gl.MapBuffer/Range. (1 copy > for both single- and multi-process.) Note, the SAB is persistent, so it > doesn't generate garbage. ArrayBufferView garbage might be generated in > some cases by a MapBufferRange, but I think could be avoided with a > MapBuffer. > > Finally, single-process browsers might be able to back the > mapped-buffer-SAB with an actual glMapBufferRange pointer. (0 copies for > single-process.) However, it's not clear yet how unmapping would behave > with SABs - SABs cannot be neutered and their backing store cannot be > swapped out. > > On Fri, Dec 22, 2017 at 4:31 AM Florian B?sch wrote: > >> On Thu, Dec 21, 2017 at 11:07 PM, Kai Ninomiya >> wrote: >>> >>> * We should be able to eventually eliminate a memcpy by, in (a) >>> multi-process browsers, using SharedArrayBuffers backed by IPC shared >>> memory, and in (b) single-process browsers, using SharedArrayBuffers backed >>> by glMapBuffer memory. It might essentially be exposed to WebGL like a >>> read-only MapBuffer operation, maybe for GL_*_READ buffers only. Ideally in >>> both cases, the SharedArrayBuffer would be read-only, which is not a >>> concept that currently exists. >>> >> >> Afaik you cannot simultaneously create IPC shared memory and have it be >> the memory region mapped by glMapBuffer. >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kai...@ Sun Dec 24 10:53:46 2017 From: kai...@ (Kai Ninomiya) Date: Sun, 24 Dec 2017 18:53:46 +0000 Subject: [Public WebGL] WEBGL_get_buffer_sub_data_async In-Reply-To: References: <2671E2AD-6F46-4B1D-9C06-84D55B05BDC8@callow.im> Message-ID: I agree, I'm saying we would do a memcpy from the mapped range to the IPC shmem, which would be two different buffers. On Sun, Dec 24, 2017, 2:38 AM Florian B?sch wrote: > glMapBufferRange returns a memory pointer you can treat as usual memory > into the buffer. IPC Memory sharing in any of its flavors requires you to > use some functionality to specially allocate that memory (mmat, shmat, > etc.). That means that you can never have a mapped buffer range and an IPC > shared memory pointing to the same location. There are some workarounds you > can try, but they will not work (no shared memory is established, some of > the functions crap out with an error code, or segfault). > > I don't see fundamentally why it shouldn't work, because all of these just > use the memory manager to do their bidding. It just seems to be the case > that each implementor of the respective functions never thought of a case > where they might be used in conjunction. > > On Fri, Dec 22, 2017 at 7:49 PM, Kai Ninomiya wrote: > >> Sorry, I think I spoke inaccurately. To clarify: >> >> Without special SABs, data is fetched the usual way with >> getBufferSubData, which does a memcpy from the the CPU-side shadow copy of >> the readback buffer into the ArrayBuffer. In multi-process, the shadow copy >> would be allocated in IPC shared memory. (2 copies for both single- and >> multi-process.) >> >> With the special mapped-buffer-SAB, the CPU-side shadow copy can actually >> be a SharedArrayBuffer. That SAB is fetched by gl.MapBuffer/Range. (1 copy >> for both single- and multi-process.) Note, the SAB is persistent, so it >> doesn't generate garbage. ArrayBufferView garbage might be generated in >> some cases by a MapBufferRange, but I think could be avoided with a >> MapBuffer. >> >> Finally, single-process browsers might be able to back the >> mapped-buffer-SAB with an actual glMapBufferRange pointer. (0 copies for >> single-process.) However, it's not clear yet how unmapping would behave >> with SABs - SABs cannot be neutered and their backing store cannot be >> swapped out. >> >> On Fri, Dec 22, 2017 at 4:31 AM Florian B?sch wrote: >> >>> On Thu, Dec 21, 2017 at 11:07 PM, Kai Ninomiya >>> wrote: >>>> >>>> * We should be able to eventually eliminate a memcpy by, in (a) >>>> multi-process browsers, using SharedArrayBuffers backed by IPC shared >>>> memory, and in (b) single-process browsers, using SharedArrayBuffers backed >>>> by glMapBuffer memory. It might essentially be exposed to WebGL like a >>>> read-only MapBuffer operation, maybe for GL_*_READ buffers only. Ideally in >>>> both cases, the SharedArrayBuffer would be read-only, which is not a >>>> concept that currently exists. >>>> >>> >>> Afaik you cannot simultaneously create IPC shared memory and have it be >>> the memory region mapped by glMapBuffer. >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 4845 bytes Desc: S/MIME Cryptographic Signature URL: From pya...@ Sun Dec 24 11:11:02 2017 From: pya...@ (=?UTF-8?Q?Florian_B=C3=B6sch?=) Date: Sun, 24 Dec 2017 20:11:02 +0100 Subject: [Public WebGL] WEBGL_get_buffer_sub_data_async In-Reply-To: References: <2671E2AD-6F46-4B1D-9C06-84D55B05BDC8@callow.im> Message-ID: that will work, but it's still an extra copy. better than more copies of course. It might be worth investigating if the kernel and/or driver could be patched to make it possible for mapBuffer[Range] to work with regular IPC shared memory. On android that would be of immediate use with the next android release, on desktop linux it might be handy to get it into upstream so it eventually lands on most peoples desktops, and I don't mind if Windows simply sucks because you can't patch their OS. On Sun, Dec 24, 2017 at 7:53 PM, Kai Ninomiya wrote: > I agree, I'm saying we would do a memcpy from the mapped range to the IPC > shmem, which would be two different buffers. > > On Sun, Dec 24, 2017, 2:38 AM Florian B?sch wrote: > >> glMapBufferRange returns a memory pointer you can treat as usual memory >> into the buffer. IPC Memory sharing in any of its flavors requires you to >> use some functionality to specially allocate that memory (mmat, shmat, >> etc.). That means that you can never have a mapped buffer range and an IPC >> shared memory pointing to the same location. There are some workarounds you >> can try, but they will not work (no shared memory is established, some of >> the functions crap out with an error code, or segfault). >> >> I don't see fundamentally why it shouldn't work, because all of these >> just use the memory manager to do their bidding. It just seems to be the >> case that each implementor of the respective functions never thought of a >> case where they might be used in conjunction. >> >> On Fri, Dec 22, 2017 at 7:49 PM, Kai Ninomiya wrote: >> >>> Sorry, I think I spoke inaccurately. To clarify: >>> >>> Without special SABs, data is fetched the usual way with >>> getBufferSubData, which does a memcpy from the the CPU-side shadow copy of >>> the readback buffer into the ArrayBuffer. In multi-process, the shadow copy >>> would be allocated in IPC shared memory. (2 copies for both single- and >>> multi-process.) >>> >>> With the special mapped-buffer-SAB, the CPU-side shadow copy can >>> actually be a SharedArrayBuffer. That SAB is fetched by gl.MapBuffer/Range. >>> (1 copy for both single- and multi-process.) Note, the SAB is persistent, >>> so it doesn't generate garbage. ArrayBufferView garbage might be generated >>> in some cases by a MapBufferRange, but I think could be avoided with a >>> MapBuffer. >>> >>> Finally, single-process browsers might be able to back the >>> mapped-buffer-SAB with an actual glMapBufferRange pointer. (0 copies for >>> single-process.) However, it's not clear yet how unmapping would behave >>> with SABs - SABs cannot be neutered and their backing store cannot be >>> swapped out. >>> >>> On Fri, Dec 22, 2017 at 4:31 AM Florian B?sch wrote: >>> >>>> On Thu, Dec 21, 2017 at 11:07 PM, Kai Ninomiya >>>> wrote: >>>>> >>>>> * We should be able to eventually eliminate a memcpy by, in (a) >>>>> multi-process browsers, using SharedArrayBuffers backed by IPC shared >>>>> memory, and in (b) single-process browsers, using SharedArrayBuffers backed >>>>> by glMapBuffer memory. It might essentially be exposed to WebGL like a >>>>> read-only MapBuffer operation, maybe for GL_*_READ buffers only. Ideally in >>>>> both cases, the SharedArrayBuffer would be read-only, which is not a >>>>> concept that currently exists. >>>>> >>>> >>>> Afaik you cannot simultaneously create IPC shared memory and have it be >>>> the memory region mapped by glMapBuffer. >>>> >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: