Friday, September 19, 2008

Clustered Scheduling with Spring and Quartz

I initially cut my teeth as a Java programmer writing some batch JDBC programs to update various sets of data. We were deployed in a Unix environment so we traditionally wrapped all of the JDBC programs in a bash shell script and kicked that thing off via a Cron entry. This environment was successful for the most part but there were issues. If our batch machine was taken offline due to upgrades, failed disk, or coffee spill then our batch process just did not run. If we wanted it to run after the fact we had to create another 1 time cron entry to kick the thing off. This typically took about 2 hours because the first hour and a half was spent with me trying to decrypt the cron syntax (minutes first or seconds? need a question mark here but star there? comma or dash between my minutes?).

Fast forward to 2008 and I have just completed a week long adventure with a coworker finally getting Spring and Quartz up and running to kick off some batch programs in a clustered environment. Failover is automatic, ad hoc runs are possible, and I still wind up spending an hour and a half each time I have to add a cron entry. I thought I would post some code snippets with explanation so that you kind reader, could maybe trim this process down to about a day.

1.) Download Spring 2.5.5 and only use the Quartz 1.6.1 RC1 jar that is bundled within the lib directory of it. DO NOT USE A PREVIOUS VERSION OF QUARTZ.

2.) Execute your appropriate database script. They can be found in the quartz distro under the docs\dbTables subdirectory. Make sure that indexes are setup as outlined here.

3.) Wrap a batch processing service with some form of Quartz. Here's the wrapper we used:
public class GenericQuartzJob extends QuartzJobBean
{
protected Logger logger = new Logger(getClass(), Constants.LOGGER_APP_NAME);

private String batchProcessorName;

public String getBatchProcessorName() {
return batchProcessorName;
}

public void setBatchProcessorName(String name) {
this.batchProcessorName = name;
}

protected void executeInternal(JobExecutionContext jobCtx) throws JobExecutionException
{
try {
SchedulerContext schedCtx = jobCtx.getScheduler().getContext();
ApplicationContext appCtx =
(ApplicationContext) schedCtx.get(
"applicationContext");
IBatchProcessor proc = (IBatchProcessor) appCtx.getBean(
batchProcessorName);
proc.invoke();
}
catch (Exception ex) {
logger.error("Unable to complete execution of " + batchProcessorName, ex);
throw new JobExecutionException("Unable to execute batch job: " + batchProcessorName, ex);
}
}
}

With that wrapper, you can execute any batch processor that is wired up in Spring, that implements the homegrown IBatchProcessor interface (which typically only has some variant of an execute or invoke method). You don't need to manage the dependencies of those batch processes as they themselves are just beans defined a Spring app context somewhere. Additionally, the base class jumps through the various contexts that you must navigate to get a properly wired bean from the Spring factory.

4.) Configure the Job Detail in your Spring app context with an xml snippet resembling this:

    <bean id="someJobDetail" class="org.springframework.scheduling.quartz.JobDetailBean">
<property name="jobClass" value="com.some-company.batch.GenericQuartzJob" />
<property name="jobDataAsMap">
<map>
<entry key="batchProcessorName" value="SomeJobBean" />
</map>
</property>
</bean>

This definition creates an instance of GenericQuartzJob (which fulfills the contract required by JobDetailBean), and plugs in the bean name of a batch processor defined somewhere else in the app context with all of it's necessary dependencies.

5.) Configure the Trigger in your Spring app context. A simple cron based version would look like this:

    <bean id="someCronTrigger" class="org.springframework.scheduling.quartz.CronTriggerBean">
<property name="jobDetail" ref="someJobDetail" />
<!-- Cron expression runs at 1am and 1pm -->
<property name="cronExpression" value="0 0 1,13 * * ?"/>
</bean>

And yes, it did take me an hour and a half to get that cron syntax working correctly. Some habits die hard.

6.) Configure the SchedulerFactoryBean

    <bean id="scheduler" class="org.springframework.scheduling.quartz.SchedulerFactoryBean" lazy-init="false">
<property name="applicationContextSchedulerContextKey" value="applicationContext" />
<property name="dataSource" ref="qtzTxDataSource"/>
<property name="transactionManager" ref="transactionManager"/>
<property name="overwriteExistingJobs" value="true"/>
<property name="autoStartup" value="true" />
<property name="triggers">
<list>
<ref bean="someCronTrigger" />
</list>
</property>
<property name="quartzProperties">
<props>
<prop key="org.quartz.scheduler.instanceName">SomeBatchScheduler</prop>
<prop key="org.quartz.scheduler.instanceId">AUTO</prop>
<prop key="org.quartz.jobStore.misfireThreshold">60000</prop>
<prop key="org.quartz.jobStore.class">org.quartz.impl.jdbcjobstore.JobStoreTX</prop>
<prop key="org.quartz.jobStore.driverDelegateClass">org.quartz.impl.jdbcjobstore.oracle.weblogic.WebLogicOracleDelegate</prop>
<prop key="org.quartz.jobStore.tablePrefix">qrtz_</prop>
<prop key="org.quartz.jobStore.isClustered">true</prop>
<prop key="org.quartz.threadPool.class">org.quartz.simpl.SimpleThreadPool</prop>
<prop key="org.quartz.threadPool.threadCount">25</prop>
<prop key="org.quartz.threadPool.threadPriority">5</prop>
</props>
</property>
</bean>

That configuration while lengthy does a few very important things. Let's go through them. The applicationContextSchedulerContextKey property will ensure that the app context is available to all of those instances of the GenericQuartzJob wrappers. The dataSource and transaction manager are necessary in order to ensure that the database is updated in a safe manner (many server instances may be updating the database at once). Their bean definitions are pretty much typical for Spring, checkout the Spring reference docs if you need more info there. OverwriteExistingJobs will make sure that every time the scheduler is started it will use the list of triggers found internally to overwrite any existing ones that may have been changed in the database. AutoStartup makes sense, but I'm not entirely sure it's necessary.

The quartz properties can be maintained in a separate file, but I prefer they are inline with the scheduler definition. They are all pretty much self explanatory, and explained in greater detail, but nested deeply in the Quartz docs.

Hopefully this article will advance you to the point of configuring a clustered, database backed, scheduling system within hours instead of days. This configuration will get you a set of spring enabled batch processes that have dependencies wired as normal, with the added benefit that their schedule is persisted and fault tolerant across nodes in the cluster. This also leaves the door open for a couple of methods of doing ad hoc runs of the jobs. We'll cover that in a part 2 article shortly.