Python is a reasonable choice of language for writing scripts that will run in many-node, many-core clusters as long as one avoids using threads. CPython (the standard version of Python) uses a Global Interpreter Lock when it comes to threads, which prevents concurrent execution of Python bytecode. (Also, I don’t care for threads, having been bitten badly by their easy ability to share state.)
Instead, one can use the multiprocessing module, which is very similar to the multithreading API. It spawns new processes instead of threads, which have no shared state between them. Processes each have their own Python interpreter, which significantly increases the overhead (making them impractical to spawn for very short tasks), but they also efficiently map to individual cores of the machine being used.
For instance, if I’m using an 8-core node, I can simply spawn 8-1 processes (one remains for the parent process) and know that, most likely, the underlying OS has assigned them each to a core. The multiprocessing library allows me to spawn a process for a single function call, instead of having to write standalone scripts that are launched by a shell script, for instance. Here is an example that spawns a process for each chunk of data to be analyzed. Each process then saves its results in a SQLite database (you may want to use a DB that handles concurrency better though):
Of course, this is only per-machine. To really take advantage of HPC, one should be able to work between multiple nodes. Without resorting to a library like MPI, you can simply separate the layers of computation. For instance, my scripts work by splitting the analysis of many gene sets amongst one node’s cores, and splitting the datasets amongst multiple nodes. Since they are all writing back to a common database, there is no real need for any interprocess or internode communication, obviating the need for MPI. Postprocessing the raw data in the database can be done afterwards.