Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

Recent questions tagged cuda

0 votes
842 views
1 answer
    I am trying to cast a uint8_t array to uint32_t array. However, when i try to do this, I cant seem to be able ... any way that I can do this? See Question&Answers more detail:os...
asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)
0 votes
1.1k views
1 answer
    I want to use nvcc -ptx from windows command line, but I always get this error message: nvcc : fatal error ... . What can be the solution? See Question&Answers more detail:os...
asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)
0 votes
611 views
1 answer
    I've created a very simple kernel (can be found here) which I successfully compile using "C:Program ... valid and compiles without issues. See Question&Answers more detail:os...
asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)
0 votes
638 views
1 answer
    I'we been writing some simple cuda program (I'm student so I need to practice), and the thing is I can ... for helping in advance folks. :) See Question&Answers more detail:os...
asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)
0 votes
632 views
1 answer
    By the link is written: https://docs.nvidia.com/deploy/pdf/CUDA_Multi_Process_Service_Overview.pdf 1.1.?AT A GLANCE ... will stay the same? See Question&Answers more detail:os...
asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)
0 votes
707 views
1 answer
    I understand that for a 1D thread block with 31 threads, it will be padded to 32 threads for warp execution. What ... (31*31=961; 961%32=1)? See Question&Answers more detail:os...
asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)
0 votes
692 views
1 answer
    I am using thrust::sequence(myvector.begin(), myvector.end(), 0, 1) and achieve good ordered list like: 0, 1, ... or am I missing a simple way.. See Question&Answers more detail:os...
asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)
0 votes
453 views
1 answer
    I'm trying to explore the concurrent kernels execution property of my Nvidia Quadro 4000, which has 2.0 ... CHK_ERR(cudaDeviceReset()); } See Question&Answers more detail:os...
asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)
0 votes
621 views
1 answer
    I am a beginner with CUDA, and my coworkers always design kernels with the following wrapping: __global__ ... specified block/grid dimensions? See Question&Answers more detail:os...
asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)
0 votes
522 views
1 answer
    I said in this question that I had some problem loading ptx modules in JCuda and after @talonmies's idea, I ... variable by reference in JCuda? See Question&Answers more detail:os...
asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)
0 votes
613 views
1 answer
    I struggle with Caffe compilation. Unfortunately I failed to compile it. Steps I followed: git clone https://github.com/ ... .9 - what to do?. See Question&Answers more detail:os...
asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)
0 votes
870 views
1 answer
    I have the following code which I am trying to compile using nvcc. Code: #include <stdio.h> #include <stdlib.h ... to solve my problem. Thanks! See Question&Answers more detail:os...
asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)
0 votes
711 views
1 answer
    How atomic operations perform when the address they are provided with resides in block shared memory? During ... atomic operation is done? See Question&Answers more detail:os...
asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)
0 votes
828 views
1 answer
    Is it possible to leave the return value of a thrust::reduce operation in device-allocated memory? In case it is ... I use a thrust::device_ptr? See Question&Answers more detail:os...
asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)
0 votes
632 views
1 answer
    I have a kernel that is performing poorly on CC 3.0 (Kepler) as opposed to CC 2.0 (Fermi). In the Nsight profiler, ... Nsight 3.0. RC / CC 3.0. See Question&Answers more detail:os...
asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)
0 votes
526 views
1 answer
    I am little bit confused about the 'code=sm_X' option within the '-gencode' statement. An example: What does ... is conflicting in my opinion. See Question&Answers more detail:os...
asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)
0 votes
612 views
1 answer
    I have a couple of doubts regarding the application of Amdahl's law with respect to GPUs. For instance, I ... for the parallel code? Thanks See Question&Answers more detail:os...
asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)
0 votes
896 views
1 answer
    I have two machines. There are multiple Tesla cards on each machine. There is also an InfiniBand card on each ... dealing with this in OpenMPI. See Question&Answers more detail:os...
asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)
0 votes
377 views
1 answer
    After read this post on CUDA Developer Blog I am struggling to understand when is safecorrect use __activemask ... the function interface. See Question&Answers more detail:os...
asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)
0 votes
818 views
1 answer
    What do we have to do to use cuPrintf()? (device compute capability 1.2, Ubuntu 12) I couldn't find " ... "hello_kernel") is not allowed Thanks! See Question&Answers more detail:os...
asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)
0 votes
924 views
1 answer
    I have just installed Debian Stretch (9) and Cuda 8 on a new GPU server. Stretch does not come with ... cuda config or an environment variable? See Question&Answers more detail:os...
asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)
0 votes
831 views
1 answer
    I want to copy data from GPU0-DDR to GPU1-DDR directly without CPU-RAM. As said here on the page-15: http: ... any advantage, why it is needed? See Question&Answers more detail:os...
asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)
0 votes
852 views
1 answer
    CUDA runtime has a convenience function cudaGetErrorString(cudaError_t error) that translates an error enum into a ... function like this? See Question&Answers more detail:os...
asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)
0 votes
1.0k views
1 answer
    Section 5.4.2 of the CUDA C Programming Guide states that branch divergence is handled either by "branch ... set the predicate". Why? See Question&Answers more detail:os...
asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)
0 votes
447 views
1 answer
    I need to modify the PTX code and compile it directly. The reason is that I want to have some specific instructions ... cubin) to "X.o" file. See Question&Answers more detail:os...
asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)
0 votes
490 views
1 answer
    I need to modify the PTX code and compile it directly. The reason is that I want to have some specific instructions ... cubin) to "X.o" file. See Question&Answers more detail:os...
asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)
0 votes
608 views
1 answer
    I need to modify the PTX code and compile it directly. The reason is that I want to have some specific instructions ... cubin) to "X.o" file. See Question&Answers more detail:os...
asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)
0 votes
532 views
1 answer
    I need to modify the PTX code and compile it directly. The reason is that I want to have some specific instructions ... cubin) to "X.o" file. See Question&Answers more detail:os...
asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)
Ask a question:
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...