If we have a lot of draws we end up traversing the linked list
looking for the same shader on each draw call even if the
shader is the same one we used the last time.
This removes a chunk of CPU overhead in the draw path.
This seems to do better in xonotic traces, we at least don't
traverse as much of the list to pick up the shaders.
I think we should be using a hash table here really.
We are seeing shaders with 0 and 2 inputs, but no 1, so we need
to handle gaps properly.
This fixes some regressions in drawpixels after some mesa changes
on the guest.
It's unfortunate, but the i965 driver deliberately crash on such simple test
case. Fortunately, it seems fairly straightforward to avoid it in virgl.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
If the current context is one of the subcontext to be destroyed,
vrend_destroy_sub_context() will fail after the last context is
destroyed.
Found thanks to amaerican fuzzy lop.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Some american fuzzy lop tests managed to replace resources, however the
old values got leaked (I am not sure if this should be allowed)
In all cases, introduce a destroy callback to the hashtable, used when a
value is removed, replaced or the table is cleared. This simplifies a
bit resource management and help avoid potential errors by using simply
refcounts.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Resources are reference-counted, if another object holds a reference to
a resource, it should not free it. Change the resource hashtable destroy
callback to unref.
Currently, this doesn't happen, since the resource hashtable is kept
until the renderer is reset. However, in the next commit, the destroy
callback will be used if a resource is replaced in the hashtable, and we
want to keep the original resource. Furthermore, even if replace
shouldn't happen, it avoids confusion and future errors if use only
reference-counting mechanism.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Since the memcpy() is done over multiple of 4 bytes, over-allocate the
destination buffer to fit multiple of 4 shader length.
Fix found thanks to american fuzzy lop.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Or we could sit for a very long time in some further loops.
Fix found thanks to american fuzzy lop.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
If we didn't run succesfully vrend_destroy_shader_selector(),
sel->sinfo.so_names might be NULL.
Fix found thanks to american fuzzy lop.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
That way an value if (type > PIPE_SHADER_GEOMETRY) guard will actually
work for all values.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
That would later crash in util_format_description() or others
Fix found thanks to american fuzzy lop.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Instead of polling the fences regularly, have a thread
that blocks for a single fence using a separate shared
context, then uses eventfd to wake up the main thread
when something happens.
Inside the guest, glmark2 typicially runs twice as fast with the thread
sync. Although in general, the performances seems to be about +30%. The
benefits is mostly for CPU-bounds tasks (when main the thread hits 100%)
A naive perf stat of the vtest renderer with glmark2 "build" test with a
fixed number of frames (500) results in the following stats data:
(do not value timing related informations, since the renderer is ran and
stopped manually)
without thread:
3032.282265 task-clock (msec) # 0.420 CPUs utilized
4,277 context-switches # 0.001 M/sec
102 cpu-migrations # 0.034 K/sec
9,020 page-faults # 0.003 M/sec
7,884,098,254 cycles # 2.600 GHz
4,440,126,451 stalled-cycles-frontend # 56.32% frontend cycles idle
<not supported> stalled-cycles-backend
11,024,091,578 instructions # 1.40 insns per cycle
# 0.40 stalled
# cycles per insn
1,091,831,588 branches # 360.069 M/sec
5,426,846 branch-misses # 0.50% of all branches
with thread:
3403.592921 task-clock (msec) # 0.452 CPUs utilized
7,145 context-switches # 0.002 M/sec
410 cpu-migrations # 0.120 K/sec
6,191 page-faults # 0.002 M/sec
7,475,038,064 cycles # 2.196 GHz
4,487,043,071 stalled-cycles-frontend # 60.03% frontend cycles idle
<not supported> stalled-cycles-backend
9,925,205,494 instructions # 1.33 insns per cycle
# 0.45 stalled
# cycles per insn
834,375,503 branches # 245.146 M/sec
4,919,995 branch-misses # 0.59% of all branches
Signed-off-by: Marc-André Lureau <marcandre.lureau@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>