Recent questions tagged cuda

0 votes

842 views

1 answer

cuda - Reading from an unaligned uint8_t recast as a uint32_t array - not getting all values

I am trying to cast a uint8_t array to uint32_t array. However, when i try to do this, I cant seem to be able ... any way that I can do this? See Question&Answers more detail:os...

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

0 votes

1.1k views

1 answer

cuda - nvcc.exe linking error Microsoft Visual Studio configuration file 'vcvars64.bat' could not found

I want to use nvcc -ptx from windows command line, but I always get this error message: nvcc : fatal error ... . What can be the solution? See Question&Answers more detail:os...

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

0 votes

611 views

1 answer

cuda - CUDA_ERROR_INVALID_IMAGE during cuModuleLoad

I've created a very simple kernel (can be found here) which I successfully compile using "C:Program ... valid and compiles without issues. See Question&Answers more detail:os...

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

0 votes

638 views

1 answer

cuda - How to start debug version of project in nsight with optirun command?

I'we been writing some simple cuda program (I'm student so I need to practice), and the thing is I can ... for helping in advance folks. :) See Question&Answers more detail:os...

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

0 votes

632 views

1 answer

cuda - Do I have to use the MPS (MULTI-PROCESS SERVICE) when using CUDA6.5 + MPI?

By the link is written: https://docs.nvidia.com/deploy/pdf/CUDA_Multi_Process_Service_Overview.pdf 1.1.?AT A GLANCE ... will stay the same? See Question&Answers more detail:os...

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

0 votes

707 views

1 answer

cuda - How is the 2D thread blocks padded for warp scheduling?

I understand that for a 1D thread block with 31 threads, it will be padded to 32 threads for warp execution. What ... (31*31=961; 961%32=1)? See Question&Answers more detail:os...

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

0 votes

692 views

1 answer

cuda - thrust::sequence - how to increase the step after each N elements

I am using thrust::sequence(myvector.begin(), myvector.end(), 0, 1) and achieve good ordered list like: 0, 1, ... or am I missing a simple way.. See Question&Answers more detail:os...

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

0 votes

453 views

1 answer

cuda kernels not executing concurrently

I'm trying to explore the concurrent kernels execution property of my Nvidia Quadro 4000, which has 2.0 ... CHK_ERR(cudaDeviceReset()); } See Question&Answers more detail:os...

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

0 votes

621 views

1 answer

cuda - Should I check the number of threads in kernel code?

I am a beginner with CUDA, and my coworkers always design kernels with the following wrapping: __global__ ... specified block/grid dimensions? See Question&Answers more detail:os...

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

0 votes

522 views

1 answer

cuda - JIT in JCuda, loading multiple ptx modules

I said in this question that I had some problem loading ptx modules in JCuda and after @talonmies's idea, I ... variable by reference in JCuda? See Question&Answers more detail:os...

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

0 votes

613 views

1 answer

cuda - Caffe compilation fails due to unsupported gcc compiler version

I struggle with Caffe compilation. Unfortunately I failed to compile it. Steps I followed: git clone https://github.com/ ... .9 - what to do?. See Question&Answers more detail:os...

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

0 votes

870 views

1 answer

cuda - CURAND Library - Compiling Error - Undefined reference to functions

I have the following code which I am trying to compile using nvcc. Code: #include <stdio.h> #include <stdlib.h ... to solve my problem. Thanks! See Question&Answers more detail:os...

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

0 votes

711 views

1 answer

cuda - Performance of atomic operations on shared memory

How atomic operations perform when the address they are provided with resides in block shared memory? During ... atomic operation is done? See Question&Answers more detail:os...

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

0 votes

828 views

1 answer

cuda - thrust reduction result on device memory

Is it possible to leave the return value of a thrust::reduce operation in device-allocated memory? In case it is ... I use a thrust::device_ptr? See Question&Answers more detail:os...

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

0 votes

632 views

1 answer

cuda - What are "Other" Issue Stall Reasons displayed by the Nsight profiler?

I have a kernel that is performing poorly on CC 3.0 (Kepler) as opposed to CC 2.0 (Fermi). In the Nsight profiler, ... Nsight 3.0. RC / CC 3.0. See Question&Answers more detail:os...

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

0 votes

526 views

1 answer

cuda - Does 'code=sm_X' embed only binary (cubin) code, or also PTX code, or both?

I am little bit confused about the 'code=sm_X' option within the '-gencode' statement. An example: What does ... is conflicting in my opinion. See Question&Answers more detail:os...

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

0 votes

612 views

1 answer

cuda - Amdahl's law and GPU

I have a couple of doubts regarding the application of Amdahl's law with respect to GPUs. For instance, I ... for the parallel code? Thanks See Question&Answers more detail:os...

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

0 votes

896 views

1 answer

cuda - How to use GPUDirect RDMA with Infiniband

I have two machines. There are multiple Tesla cards on each machine. There is also an InfiniBand card on each ... dealing with this in OpenMPI. See Question&Answers more detail:os...

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

0 votes

377 views

1 answer

cuda - __activemask() vs __ballot_sync()

After read this post on CUDA Developer Blog I am struggling to understand when is safecorrect use __activemask ... the function interface. See Question&Answers more detail:os...

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

0 votes

818 views

1 answer

cuda - How do we use cuPrintf()?

What do we have to do to use cuPrintf()? (device compute capability 1.2, Ubuntu 12) I couldn't find " ... "hello_kernel") is not allowed Thanks! See Question&Answers more detail:os...

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

0 votes

924 views

1 answer

cuda - Set default host compiler for nvcc

I have just installed Debian Stretch (9) and Cuda 8 on a new GPU server. Stretch does not come with ... cuda config or an environment variable? See Question&Answers more detail:os...

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

0 votes

831 views

1 answer

cuda - What is the difference between cudaMemcpy() and cudaMemcpyPeer() for P2P-copy?

I want to copy data from GPU0-DDR to GPU1-DDR directly without CPU-RAM. As said here on the page-15: http: ... any advantage, why it is needed? See Question&Answers more detail:os...

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

0 votes

852 views

1 answer

cuda - Equivalent of cudaGetErrorString for cuBLAS?

CUDA runtime has a convenience function cudaGetErrorString(cudaError_t error) that translates an error enum into a ... function like this? See Question&Answers more detail:os...

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

0 votes

1.0k views

1 answer

cuda - Branch and predicated instructions

Section 5.4.2 of the CUDA C Programming Guide states that branch divergence is handled either by "branch ... set the predicate". Why? See Question&Answers more detail:os...

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

0 votes

447 views

1 answer

cuda - How to compile PTX code

I need to modify the PTX code and compile it directly. The reason is that I want to have some specific instructions ... cubin) to "X.o" file. See Question&Answers more detail:os...

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

0 votes

490 views

1 answer

cuda - How to compile PTX code

I need to modify the PTX code and compile it directly. The reason is that I want to have some specific instructions ... cubin) to "X.o" file. See Question&Answers more detail:os...

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

0 votes

608 views

1 answer

cuda - How to compile PTX code

I need to modify the PTX code and compile it directly. The reason is that I want to have some specific instructions ... cubin) to "X.o" file. See Question&Answers more detail:os...

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

0 votes

532 views

1 answer

cuda - How to compile PTX code

I need to modify the PTX code and compile it directly. The reason is that I want to have some specific instructions ... cubin) to "X.o" file. See Question&Answers more detail:os...

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

Categories

Just Browsing Browsing

Most popular tags