Wednesday 16 December 2015

Solr Master - Slave Configuration with DataImportHandler & Scheduling

In this post we will se how we can setup Solr Master - Slave replication setup as shown below -


For simplicity lets assume that we have two nodes node1 and node2. Node1 is the master node and Node2 is the slave node.

1. Install solr-5.3.1 on both Node1(master) and Node2(slave)
2. Create Solr core using the command
    $> bin/solr create [-c name] [-d confdir] [-n configName] [-shards #] [-replicationFactor #] [-p           port]
on both Node1 and Node2

Lets assume the name of the core is test_core.

So in both the instance if we go to ${SOLR_HOME}/server/solr we will see test_core which have conf directory , core.properties file and data directory.

Now lets start with master slave configuration -

Master Setup 

If we navigate to conf directory within the test-core directory under /server/solr we will see solrconfig.xml file

Edit the file and add

<requestHandler name="/replication" class="solr.ReplicationHandler">
    <lst name="master">
         <str name="enable">${master.replication.enabled:false}</str>
         <str name="replicateAfter">commit</str>
         <str name="replicateAfter">optimize</str>
        <str name="replicateAfter">startup</str>
    </lst>

</requestHandler>

add master.replication.enabled=true in core.properties file located in /solr directory.


Slave Setup

If we navigate to conf directory within the test-core directory under /server/solr we will see solrconfig.xml file

Edit the file and add

<requestHandler 
name="/replication" class="solr.ReplicationHandler">
     <lst name="slave">
           <str name="enable">${slave.replication.enabled:false}</str>
           <str name="masterUrl">http://${masterserver}/solr/${solr.core.name}/replication</str>
          <str name="pollInterval">00:05:00</str></lst>

</requestHandler>

add 


slave.replication.enabled=true
masterserver=52.33.134.44:8983


solr.core.name=<core_name> (test_core)

in core.properties file located in /solr directory.


Thats it we are done with master slave configuration.

DataImportHandler

Using solr DataImportHandler we can create indexes in solr directly from data store like MySQL Oracle, Postgre SQL etc.

Lets continue with previous example to configure a data import handler
1.  Edit solrconfig.xml file under conf directory of your core and add -

<requestHandler name="/dataimport"                           class="org.apache.solr.handler.dataimport.DataImportHandler">
  <lst name="defaults">
      <str name="config">data-config.xml</str>
  </lst>
</requestHandler>

2. Create data-config.xml file within the conf directory with following content-

<dataConfig>
<dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver" url="" user="" password=""/>
    <document name="">
        <entity name="" query=""
deltaQuery="<some_date_condition> &gt; '${recommendation.last_index_time}';">
 <field column="" name="" />
            .
.
.
.
      <field column="allcash_total_annualized_return_growth" name="Allcash_total_annualized_return_growth" />
        </entity>
    </document>
</dataConfig>

3. Create corresponding filed mapping in managed-schema file for index creation.

4. Make sure you have the jar file for Driver class is available in lib directory or any other directory and you have mentioned that in solrconfig.xml file like

<lib dir="${solr.install.dir:../../../..}/contrib/extraction/lib" regex=".*\.jar" />

We are done with DataImportHandler configuration.

Scheduling: 

Solr by default don,t support scheduling for delta import.
Clone either of

1. https://github.com/badalb/solr-data-import-scheduler.git
2. https://github.com/mbonaci/solr-data-import-scheduler.git

Create a jar file and put that jar file in {SOLR_HOME}/ server/solr-webapp/ webapp/ WEB-INF / lib directory

3. Make sure, regardless of whether you have single or multi-core Solr, that you create dataimport.properties located in your solr.home/conf (NOT solr.home/core/conf) with the content like

 #  to sync or not to sync
#  1 - active; anything else - inactive
syncEnabled=1

#  which cores to schedule
#  in a multi-core environment you can decide which cores you want syncronized
#  leave empty or comment it out if using single-core deployment
syncCores=coreHr,coreEn

#  solr server name or IP address
#  [defaults to localhost if empty]
server=localhost

#  solr server port
#  [defaults to 80 if empty]
port=8080

#  application name/context
#  [defaults to current ServletContextListener's context (app) name]
webapp=solrTest_WEB

#  URL params [mandatory]
#  remainder of URL
params=/select?qt=/dataimport&command=delta-import&clean=false&commit=true

#  schedule interval
#  number of minutes between two runs
#  [defaults to 30 if empty]
interval=10

4. Add application listener to web.xml of solr web app ({SOLR_HOME}/ server/solr-webapp/WEB-INF/web.xml)

<listener>
  <listener-class>org.apache.solr.handler.dataimport.scheduler.ApplicationListener</listener-class>
</listener>

Restart Solr so that changes are reflected.

Happy searching .....

No comments:

Post a Comment