Deterministic Failure Injection¶
Although injecting a failure by calling a method in the middle of a test case is suitable for many of the scenarios, there exists scenarios where it is needed to inject failures in a very specific moment. With Failify, for a few supported languages, it is possible to inject a failure right before or after a method call where a specific stack trace is present. This happens through defining a set of named internal and test case events, ordering those events in a run sequence string, and let the Failify’s runtime engine enforce the specified order between the nodes.
Internal events are the ones that happen inside a node. Realizing internal events requires binary instrumentation, and as such, is only supported for a few programming languages. You can find more information in Run Sequence Instrumentation Engine page. Available internal events are:
- Scheduling Event: This event can be of type BLOCKING or UNBLOCKING and can happen before or after a specific stack trace. The stack trace should come from a stack trace event definition. When defining this kind of events, the definition should be a pair of blocking and unblocking events. Basically, make sure to finally unblock everything that has been blocked. This event is useful when it is needed to block all of the threads for a specific stack trace, do some other stuff or let the other threads make progress, and then, unblock the blocked threads.
.withNode("n1", "service1") .withSchedulingEvent("bast1") .after("st1") // The name of the stack trace event. An example comes later .operation(SchedulingOperation.BLOCK) .and() .withSchedulingEvent("ubast1") .after("st1") .operation(SchedulingOperation.UNBLOCK) .and() // The same events using shortcut methods .blockAfter("bast1", "st1") .unblockAfter("ubast1", "st1") .and()
- Stack Trace Event: This event is kind of like a scheduling event except that nothing happens between blocking and unblocking. All of the threads with the defined stack trace will be blocked until the dependencies of the event are satisfied (based on the defined run sequence). The blocking can happen before or after a method. This event can act as an indicator that the program has reached a specific method with a specific stack trace.
.withNode("n1", "service1") .withStackTraceEvent("st1") .trace("io.failify.Hello.worldCaller") .trace("io.failify.Hello.world") .blockAfter().and() // The same event using a shortcut method .stackTrace("st1", "io.failify.Hello.worldCaller,io.failify.Hello.world", true) .and()
- Garbage Collection Event: This event is for invoking the garbage collector for supported languages e.g. Java.
withNode("n1", "service1"). .withGarbageCollectionEvent("gc1").and() and()
Test Case Events¶
Test case events are the connection point between the test case and the Failify’s runtime engine. Internal events’s orders are enforced by the runtime engine, but it is the test case responsibility to enforce the test case events if they are included in the run sequence.
new Deployment.Builder("sample") .testCaseEvents("tc1","tc2")
The Run Sequence¶
Finally after defining all the necessary events, you should tie them together in the run sequence by using event names
as the operands,
| as operators and parenthesis.
| indicate sequential and parallel execution
new Deployment.Builder("sample") .runSequence("bast1 * w1 * ubast1 * (gc1 | x1)")
This run sequence blocks all the threads in node
n1 with the stack trace of event
bast1), waits for the
test case to enforce
tc1, unblcoks the blocked threads in node
ubast1), and finally, in parallel, performs
a garbage collection in
gc1) and kills node
At any point, a test can use the
FailifyRunner object to enforce the order of a test case event. Enforcement of a test case
event in the test case is only needed if something is needed to be done when the event dependencies are satisfied, e.g.
injecting a failure.
runner.runtime().enforceOrder("tc1", 10, () -> runner.runtime().clockDrift("n1", -100));
Here, when the dependencies of event
tc1 are satisified, a clock drift in the amount -100ms will be applied to node
tc1 event will be marked as satisfied. If after 10 seconds the dependencies of
tc1 are not satisfied,
TimeoutException will be thrown. If the only thing that the test case needs is to wait for an event or its
dependencies to be satisfied the
waitFor method can be used.
Here again, if the event dependecies are not satisfied in 10 seconds, a
TimeoutException will be thrown.