Home PC Games Linux Windows Database Network Programming Server Mobile  
  Home \ Server \ ElasticSearch basic usage and cluster structures     - Build their own recursive DNS server (Server)

- Linux beginners to develop the seven habits (Linux)

- Linux System Getting Started Learning: Fix ImportError: No module named scapy.all (Linux)

- grep command output highlighted word (Linux)

- Linux kernel RCU (Read Copy Update) lock Brief (Linux)

- About ORA-02391 solution (Database)

- CentOS 7.0 running Docker kernel error solution (Server)

- Python interview must look at 15 questions (Programming)

- JavaScript common array manipulation functions and usage (Programming)

- After you change the GRUB boot disk partition repair (Linux)

- Learning and Practice (Linux)

- Python image processing library (PIL) to install and simple to use (Linux)

- Eclipse configuration GTK (Linux)

- Linux kernel to achieve soft RPS network to receive soft interrupt load balancing to distribute (Linux)

- Java open source monitoring platform Zorka basic use (Linux)

- C ++ Supplements - malloc free and new delete the same and different (Programming)

- How to use awk command in Linux (Linux)

- The Java ThreadLocal (Programming)

- MySQL to NoSQL avatar (Database)

- MySQL view (Database)

  ElasticSearch basic usage and cluster structures
  Add Date : 2017-08-31      
  First, Introduction

ElasticSearch and Solr are based on the Lucene search engine, but ElasticSearch natural supports distributed, while Solr is SolrCloud is distributed version after version 4.0, distributed support Solr need ZooKeeper support.

Here's a detailed comparison ElasticSearch and Solr: http: //solr-vs-elasticsearch.com/

Second, the basic usage

Elasticsearch clusters can contain multiple indexes (indices), each index can contain multiple types (types), each type contains multiple documents (documents), and each document contains multiple fields (Fields), this document-oriented type storage, can be considered a NoSQL it.

ES than traditional relational database, some conceptual understanding:
Relational DB -> Databases -> Tables -> Rows -> Columns
Elasticsearch -> Indices -> Types -> Documents -> Fields

From creating a Client to add, delete, query and other basic usage:

1. Create Client

public ElasticSearchService (String ipAddress, int port) {
        client = new TransportClient ()
                .addTransportAddress (new InetSocketTransportAddress (ipAddress,

Here is a TransportClient.

ES both client comparison:

TransportClient: lightweight Client, use Netty thread pool, Socket connection to ES cluster. Itself is not added to the cluster, just as the processing of the request.

Node Client: the client node itself ES nodes to the cluster, and the other nodes as ElasticSearch. Frequent opening and closing of such Node Clients will have a "noise" in the cluster.

2, create / delete Index and Type Information


    // Create an index
    public void createIndex () {
        client.admin (). indices (). create (new CreateIndexRequest (IndexName))
                .actionGet ();

    // Clear all indexes
    public void deleteIndex () {
        IndicesExistsResponse indicesExistsResponse = client.admin (). Indices ()
                .exists (new IndicesExistsRequest (new String [] {IndexName}))
                .actionGet ();
        if (indicesExistsResponse.isExists ()) {
            client.admin (). indices (). delete (new DeleteIndexRequest (IndexName))
                    .actionGet ();
    // Delete a Type Index under
    public void deleteType () {
        .. Client.prepareDelete () setIndex (IndexName) .setType (TypeName) .execute () actionGet ();

    // Define an index mapping type
    public void defineIndexTypeMapping () {
        try {
            XContentBuilder mapBuilder = XContentFactory.jsonBuilder ();
            mapBuilder.startObject ()
            .startObject (TypeName)
                .startObject ( "properties")
                    .startObject (IDFieldName) .field ( "type", "long"). field ( "store", "yes"). endObject ()
                    .startObject (SeqNumFieldName) .field ( "type", "long"). field ( "store", "yes"). endObject ()
                    .startObject (IMSIFieldName) .field ( "type", "string"). field ( "index", "not_analyzed"). field ( "store", "yes"). endObject ()
                    .startObject (IMEIFieldName) .field ( "type", "string"). field ( "index", "not_analyzed"). field ( "store", "yes"). endObject ()
                    .startObject (DeviceIDFieldName) .field ( "type", "string"). field ( "index", "not_analyzed"). field ( "store", "yes"). endObject ()
                    .startObject (OwnAreaFieldName) .field ( "type", "string"). field ( "index", "not_analyzed"). field ( "store", "yes"). endObject ()
                    .startObject (TeleOperFieldName) .field ( "type", "string"). field ( "index", "not_analyzed"). field ( "store", "yes"). endObject ()
                    .startObject (TimeFieldName) .field ( "type", "date"). field ( "store", "yes"). endObject ()
                .endObject ()
            .endObject ()
            .endObject ();

            PutMappingRequest putMappingRequest = Requests
                    .putMappingRequest (IndexName) .type (TypeName)
                    .source (mapBuilder);
            .. Client.admin () indices () putMapping (putMappingRequest) .actionGet ();
        } Catch (IOException e) {
            log.error (e.toString ());


Type here a customized index mapping (Mapping), the default ES automatically handles data type mapping: for Integer maps to long, float to double, it maps the string string, time date, true or false to boolean.

Note: For a string, ES default would do "analyzed" process, namely first word, and then remove stop words and other treatment index. If you need a string as a whole to be indexed, so need to set this field: field ( "index", "not_analyzed").

Refer to: https: //www.elastic.co/guide/en/elasticsearch/guide/current/mapping-intro.html

3, the index data


    // Batch index data
    public void indexHotSpotDataList (List dataList) {
        if (dataList! = null) {
            int size = dataList.size ();
            if (size> 0) {
                BulkRequestBuilder bulkRequest = client.prepareBulk ();
                for (int i = 0; i                     Hotspotdata data = dataList.get (i);
                    String jsonSource = getIndexDataFromHotspotData (data);
                    if (jsonSource! = null) {
                        bulkRequest.add (client
                                .prepareIndex (IndexName, TypeName,
                                        data.getId (). toString ())
                                .setRefresh (true) .setSource (jsonSource));

                BulkResponse bulkResponse = bulkRequest.execute () actionGet ().;
                if (bulkResponse.hasFailures ()) {
                    Iterator iter = bulkResponse.iterator ();
                    while (iter.hasNext ()) {
                        BulkItemResponse itemResponse = iter.next ();
                        if (itemResponse.isFailed ()) {
                            log.error (itemResponse.getFailureMessage ());

    // Index data
    public boolean indexHotspotData (Hotspotdata data) {
        String jsonSource = getIndexDataFromHotspotData (data);
        if (jsonSource! = null) {
            IndexRequestBuilder requestBuilder = client.prepareIndex (IndexName,
                    TypeName) .setRefresh (true);
            requestBuilder.setSource (jsonSource)
                    .execute () actionGet ().;
            return true;

        return false;

    // Get the index of the string
    public String getIndexDataFromHotspotData (Hotspotdata data) {
        String jsonString = null;
        if (data! = null) {
            try {
                XContentBuilder jsonBuilder = XContentFactory.jsonBuilder ();
                jsonBuilder.startObject (). field (IDFieldName, data.getId ())
                        .field (SeqNumFieldName, data.getSeqNum ())
                        .field (IMSIFieldName, data.getImsi ())
                        .field (IMEIFieldName, data.getImei ())
                        .field (DeviceIDFieldName, data.getDeviceID ())
                        .field (OwnAreaFieldName, data.getOwnArea ())
                        .field (TeleOperFieldName, data.getTeleOper ())
                        .field (TimeFieldName, data.getCollectTime ())
                        .endObject ();
                jsonString = jsonBuilder.string ();
            } Catch (IOException e) {
                log.equals (e);

        return jsonString;


ES Support batch and single data index.

4, the query to obtain data


    // Get a small amount of data 100
    private List getSearchData (QueryBuilder queryBuilder) {
        List ids = new ArrayList <> ();
        SearchResponse searchResponse = client.prepareSearch (IndexName)
                .setTypes (TypeName) .setQuery (queryBuilder) .setSize (100)
                .execute () actionGet ().;
        SearchHits searchHits = searchResponse.getHits ();
        for (SearchHit searchHit: searchHits) {
            Integer id = (Integer) searchHit.getSource () get ( "id").;
            ids.add (id);
        return ids;

    // Get a lot of data
    private List getSearchDataByScrolls (QueryBuilder queryBuilder) {
        List ids = new ArrayList <> ();
        // Get the first data 100 000
        SearchResponse scrollResp = client.prepareSearch (IndexName)
                .setSearchType (SearchType.SCAN) .setScroll (new TimeValue (60000))
                .setQuery (queryBuilder) .setSize (100000) .execute () actionGet ().;
        while (true) {
            for (SearchHit searchHit: scrollResp.getHits () getHits ().) {
                Integer id = (Integer) searchHit.getSource () get (IDFieldName).;
                ids.add (id);
            scrollResp = client.prepareSearchScroll (scrollResp.getScrollId ())
                    . .setScroll (New TimeValue (600000)) execute () actionGet ().;
            if (scrollResp.getHits (). getHits (). length == 0) {

        return ids;


Here is a query QueryBuilder conditions, ES supports paging query access to data and to be a one-time get a lot of data, you need to use Scroll Search.

5, the polymerization (Aggregation Facet) Query


    // Get the list of devices on the distribution of data within a certain time each device
    public Map getDeviceDistributedInfo (String startTime,
            String endTime, List deviceList) {

        Map resultsMap = new HashMap <> ();

        QueryBuilder deviceQueryBuilder = getDeviceQueryBuilder (deviceList);
        QueryBuilder rangeBuilder = getDateRangeQueryBuilder (startTime, endTime);
        QueryBuilder queryBuilder = QueryBuilders.boolQuery ()
                .must (deviceQueryBuilder) .must (rangeBuilder);

        TermsBuilder termsBuilder = AggregationBuilders.terms ( "DeviceIDAgg"). Size (Integer.MAX_VALUE)
                .field (DeviceIDFieldName);
        SearchResponse searchResponse = client.prepareSearch (IndexName)
                .setQuery (queryBuilder) .addAggregation (termsBuilder)
                .execute () actionGet ().;
        Terms terms = searchResponse.getAggregations () get ( "DeviceIDAgg").;
        if (terms! = null) {
            for (Terms.Bucket entry: terms.getBuckets ()) {
                resultsMap.put (entry.getKey (),
                        String.valueOf (entry.getDocCount ()));
        return resultsMap;


Aggregation query can query statistical analysis of such similar features: The data distribution of a month, certain types of data, maximum, minimum, sum, average, and so on.

Refer to: https: //www.elastic.co/guide/en/elasticsearch/client/java-api/current/java-aggs.html

Third, the cluster configuration

Profile elasticsearch.yml

Cluster name and node name:

# Cluster.name: elasticsearch

# Node.name: "Franz Kafka"

Whether to participate in elections and whether to store master data

# Node.master: true

# Node.data: true

Slice number and the number of copies

# Index.number_of_shards: 5
# Index.number_of_replicas: 1

master election minimal number of nodes that must be set for the entire cluster node number of half plus one, ie N / 2 + 1

# Discovery.zen.minimum_master_nodes: 1

discovery ping timeout, network congestion, network poor state of the case is set a little higher

# Discovery.zen.ping.timeout: 3s

Note that the number of distributed systems across the cluster node N to an odd number! !

Four, Elasticsearch widget

1, elasticsearch-head is a elasticsearch cluster management tool: ./ elasticsearch-1.7.1 / bin / plugin -install mobz / elasticsearch-head

2, elasticsearch-sql: use SQL syntax of the query elasticsearch: ./ bin / plugin -u https://github.com/NLPchina/elasticsearch-sql/releases/download/1.3.5/elasticsearch-sql-1.3.5.zip --install sql

github address: https: //github.com/NLPchina/elasticsearch-sql

3, elasticsearch-bigdesk elasticsearch is a cluster monitoring tool, you can use it to view a variety of status ES cluster.

Installation: ./ bin / plugin -install lukas-vlcek / bigdesk

Visit: http: // 9200 / _plugin / bigdesk /,

4, elasticsearch-servicewrapper plug is ElasticSearch service plugin,

In https://github.com/elasticsearch/elasticsearch-servicewrapper download the plug after decompression, the directory service are copied to the bin directory elasticsearch directory.

Then, by executing the following statement to install, start, stop ElasticSearch:

sh elasticsearch install

sh elasticsearch start

sh elasticsearch stop
- Four levels to deal with Linux server attacks (Linux)
- Github Remote Assistance (Linux)
- Manually create Oracle Database Explanations (Database)
- CentOS6 installed Tomcat (Server)
- Ubuntu set Swap Space Tutorial (Linux)
- CentOS5 installation Nodejs (Linux)
- CentOS 5.5 kernel upgrade installation iftop (Linux)
- MongoDB common optimization settings in Linux (Database)
- Mahout source code analysis: FP-Growth algorithm parallelization (Programming)
- Ubuntu 14.10 / 14.04 / 12.04 virtual users to install the printing software Boomaga (Linux)
- To explore the Android ListView caching mechanism again (Programming)
- Python configuration tortuous road of third-party libraries Numpy and matplotlib (Programming)
- Git uses a standard process (Linux)
- Shell Scripting early experience (Programming)
- Modular JavaScript (Programming)
- Linux security concerns again (Linux)
- Linux system started to learn: Teaches you install Fedora 22 on VirtualBox (Linux)
- Linux operating tips: Can not open file for writing or operation not permitted solution (Linux)
- Ubuntu server 8.04 Firewall Guide (Linux)
- Oracle PL / SQL selective basis (IF CASE), (LOOP WHILE FOR) (Database)
  CopyRight 2002-2022 newfreesoft.com, All Rights Reserved.