1. A method for implementing a thread-level speculation system on a chip multiprocessor having multiple processing units and an associated cache memory hierarchy, wherein programming threads in speculative and non-speculative modes are assigned to said multiple processing units for parallel execution, said method comprising: providing an additional private local cache dedicated to a respective processing unit for use only in thread-level speculation mode, each said private local cache for storing speculative results and state associated with an executing thread;
providing a dedicated bus interconnecting said dedicated private local caches for enabling data forwarding between parallel executing threads to detect data dependency conditions;
receiving, at a processing unit, a thread identifier for executing a speculative or non-speculative programming thread;
starting, at said processing unit, a current thread execution in response to receiving a thread identifier;
implementing, at a processing unit, logic for performing:
identifying a start of a new speculative thread;
tracking state information including speculative memory state associated with said new speculative thread;
storing speculative memory state information associated with said new speculative thread in said dedicated local cache;
said logic further implementing one or more of:
determining whether a data dependency violation condition has occurred in one of: a current or an older, less speculative thread, and in response to a detected data dependency violation condition performing one or more of:
stopping execution of a next speculative thread; and
discarding the stored speculative memory state information associated with said current or older, less speculative thread in said dedicated local cache; and
stopping the execution of the current thread or older, less speculative thread; and,
said logic determining whether a current or older, less speculative thread has become non-speculative and performing one or more of:
committing data results of the non-speculative thread execution to a lower memory level of said cache memory hierarchy when said speculative thread becomes non-speculative; and,
initiating a new speculative thread processing request and, after executing remaining instructions in said executing non-speculative thread;
promoting a next speculative thread into a non-speculative thread; and,
restarting thread execution when a thread has been aborted in response to data dependency violation condition;
wherein said bus interconnected dedicated private local caches and implemented logic for tracking speculative access between multiple processing units enables coherent speculative multithreading without modification to a processing unit core.