|
First, Introduction
ElasticSearch and Solr are based on the Lucene search engine, but ElasticSearch natural supports distributed, while Solr is SolrCloud is distributed version after version 4.0, distributed support Solr need ZooKeeper support.
Here's a detailed comparison ElasticSearch and Solr: http: //solr-vs-elasticsearch.com/
Second, the basic usage
Elasticsearch clusters can contain multiple indexes (indices), each index can contain multiple types (types), each type contains multiple documents (documents), and each document contains multiple fields (Fields), this document-oriented type storage, can be considered a NoSQL it.
ES than traditional relational database, some conceptual understanding:
Relational DB -> Databases -> Tables -> Rows -> Columns
Elasticsearch -> Indices -> Types -> Documents -> Fields
From creating a Client to add, delete, query and other basic usage:
1. Create Client
public ElasticSearchService (String ipAddress, int port) {
client = new TransportClient ()
.addTransportAddress (new InetSocketTransportAddress (ipAddress,
port));
}
Here is a TransportClient.
ES both client comparison:
TransportClient: lightweight Client, use Netty thread pool, Socket connection to ES cluster. Itself is not added to the cluster, just as the processing of the request.
Node Client: the client node itself ES nodes to the cluster, and the other nodes as ElasticSearch. Frequent opening and closing of such Node Clients will have a "noise" in the cluster.
2, create / delete Index and Type Information
// Create an index
public void createIndex () {
client.admin (). indices (). create (new CreateIndexRequest (IndexName))
.actionGet ();
}
// Clear all indexes
public void deleteIndex () {
IndicesExistsResponse indicesExistsResponse = client.admin (). Indices ()
.exists (new IndicesExistsRequest (new String [] {IndexName}))
.actionGet ();
if (indicesExistsResponse.isExists ()) {
client.admin (). indices (). delete (new DeleteIndexRequest (IndexName))
.actionGet ();
}
}
// Delete a Type Index under
public void deleteType () {
.. Client.prepareDelete () setIndex (IndexName) .setType (TypeName) .execute () actionGet ();
}
// Define an index mapping type
public void defineIndexTypeMapping () {
try {
XContentBuilder mapBuilder = XContentFactory.jsonBuilder ();
mapBuilder.startObject ()
.startObject (TypeName)
.startObject ( "properties")
.startObject (IDFieldName) .field ( "type", "long"). field ( "store", "yes"). endObject ()
.startObject (SeqNumFieldName) .field ( "type", "long"). field ( "store", "yes"). endObject ()
.startObject (IMSIFieldName) .field ( "type", "string"). field ( "index", "not_analyzed"). field ( "store", "yes"). endObject ()
.startObject (IMEIFieldName) .field ( "type", "string"). field ( "index", "not_analyzed"). field ( "store", "yes"). endObject ()
.startObject (DeviceIDFieldName) .field ( "type", "string"). field ( "index", "not_analyzed"). field ( "store", "yes"). endObject ()
.startObject (OwnAreaFieldName) .field ( "type", "string"). field ( "index", "not_analyzed"). field ( "store", "yes"). endObject ()
.startObject (TeleOperFieldName) .field ( "type", "string"). field ( "index", "not_analyzed"). field ( "store", "yes"). endObject ()
.startObject (TimeFieldName) .field ( "type", "date"). field ( "store", "yes"). endObject ()
.endObject ()
.endObject ()
.endObject ();
PutMappingRequest putMappingRequest = Requests
.putMappingRequest (IndexName) .type (TypeName)
.source (mapBuilder);
.. Client.admin () indices () putMapping (putMappingRequest) .actionGet ();
} Catch (IOException e) {
log.error (e.toString ());
}
}
Type here a customized index mapping (Mapping), the default ES automatically handles data type mapping: for Integer maps to long, float to double, it maps the string string, time date, true or false to boolean.
Note: For a string, ES default would do "analyzed" process, namely first word, and then remove stop words and other treatment index. If you need a string as a whole to be indexed, so need to set this field: field ( "index", "not_analyzed").
Refer to: https: //www.elastic.co/guide/en/elasticsearch/guide/current/mapping-intro.html
3, the index data
// Batch index data
public void indexHotSpotDataList (List dataList) {
if (dataList! = null) {
int size = dataList.size ();
if (size> 0) {
BulkRequestBuilder bulkRequest = client.prepareBulk ();
for (int i = 0; i
Hotspotdata data = dataList.get (i);
String jsonSource = getIndexDataFromHotspotData (data);
if (jsonSource! = null) {
bulkRequest.add (client
.prepareIndex (IndexName, TypeName,
data.getId (). toString ())
.setRefresh (true) .setSource (jsonSource));
}
}
BulkResponse bulkResponse = bulkRequest.execute () actionGet ().;
if (bulkResponse.hasFailures ()) {
Iterator iter = bulkResponse.iterator ();
while (iter.hasNext ()) {
BulkItemResponse itemResponse = iter.next ();
if (itemResponse.isFailed ()) {
log.error (itemResponse.getFailureMessage ());
}
}
}
}
}
}
// Index data
public boolean indexHotspotData (Hotspotdata data) {
String jsonSource = getIndexDataFromHotspotData (data);
if (jsonSource! = null) {
IndexRequestBuilder requestBuilder = client.prepareIndex (IndexName,
TypeName) .setRefresh (true);
requestBuilder.setSource (jsonSource)
.execute () actionGet ().;
return true;
}
return false;
}
// Get the index of the string
public String getIndexDataFromHotspotData (Hotspotdata data) {
String jsonString = null;
if (data! = null) {
try {
XContentBuilder jsonBuilder = XContentFactory.jsonBuilder ();
jsonBuilder.startObject (). field (IDFieldName, data.getId ())
.field (SeqNumFieldName, data.getSeqNum ())
.field (IMSIFieldName, data.getImsi ())
.field (IMEIFieldName, data.getImei ())
.field (DeviceIDFieldName, data.getDeviceID ())
.field (OwnAreaFieldName, data.getOwnArea ())
.field (TeleOperFieldName, data.getTeleOper ())
.field (TimeFieldName, data.getCollectTime ())
.endObject ();
jsonString = jsonBuilder.string ();
} Catch (IOException e) {
log.equals (e);
}
}
return jsonString;
}
ES Support batch and single data index.
4, the query to obtain data
// Get a small amount of data 100
private List getSearchData (QueryBuilder queryBuilder) {
List ids = new ArrayList <> ();
SearchResponse searchResponse = client.prepareSearch (IndexName)
.setTypes (TypeName) .setQuery (queryBuilder) .setSize (100)
.execute () actionGet ().;
SearchHits searchHits = searchResponse.getHits ();
for (SearchHit searchHit: searchHits) {
Integer id = (Integer) searchHit.getSource () get ( "id").;
ids.add (id);
}
return ids;
}
// Get a lot of data
private List getSearchDataByScrolls (QueryBuilder queryBuilder) {
List ids = new ArrayList <> ();
// Get the first data 100 000
SearchResponse scrollResp = client.prepareSearch (IndexName)
.setSearchType (SearchType.SCAN) .setScroll (new TimeValue (60000))
.setQuery (queryBuilder) .setSize (100000) .execute () actionGet ().;
while (true) {
for (SearchHit searchHit: scrollResp.getHits () getHits ().) {
Integer id = (Integer) searchHit.getSource () get (IDFieldName).;
ids.add (id);
}
scrollResp = client.prepareSearchScroll (scrollResp.getScrollId ())
. .setScroll (New TimeValue (600000)) execute () actionGet ().;
if (scrollResp.getHits (). getHits (). length == 0) {
break;
}
}
return ids;
}
Here is a query QueryBuilder conditions, ES supports paging query access to data and to be a one-time get a lot of data, you need to use Scroll Search.
5, the polymerization (Aggregation Facet) Query
// Get the list of devices on the distribution of data within a certain time each device
public Map getDeviceDistributedInfo (String startTime,
String endTime, List deviceList) {
Map resultsMap = new HashMap <> ();
QueryBuilder deviceQueryBuilder = getDeviceQueryBuilder (deviceList);
QueryBuilder rangeBuilder = getDateRangeQueryBuilder (startTime, endTime);
QueryBuilder queryBuilder = QueryBuilders.boolQuery ()
.must (deviceQueryBuilder) .must (rangeBuilder);
TermsBuilder termsBuilder = AggregationBuilders.terms ( "DeviceIDAgg"). Size (Integer.MAX_VALUE)
.field (DeviceIDFieldName);
SearchResponse searchResponse = client.prepareSearch (IndexName)
.setQuery (queryBuilder) .addAggregation (termsBuilder)
.execute () actionGet ().;
Terms terms = searchResponse.getAggregations () get ( "DeviceIDAgg").;
if (terms! = null) {
for (Terms.Bucket entry: terms.getBuckets ()) {
resultsMap.put (entry.getKey (),
String.valueOf (entry.getDocCount ()));
}
}
return resultsMap;
}
Aggregation query can query statistical analysis of such similar features: The data distribution of a month, certain types of data, maximum, minimum, sum, average, and so on.
Refer to: https: //www.elastic.co/guide/en/elasticsearch/client/java-api/current/java-aggs.html
Third, the cluster configuration
Profile elasticsearch.yml
Cluster name and node name:
# Cluster.name: elasticsearch
# Node.name: "Franz Kafka"
Whether to participate in elections and whether to store master data
# Node.master: true
# Node.data: true
Slice number and the number of copies
# Index.number_of_shards: 5
# Index.number_of_replicas: 1
master election minimal number of nodes that must be set for the entire cluster node number of half plus one, ie N / 2 + 1
# Discovery.zen.minimum_master_nodes: 1
discovery ping timeout, network congestion, network poor state of the case is set a little higher
# Discovery.zen.ping.timeout: 3s
Note that the number of distributed systems across the cluster node N to an odd number! !
Four, Elasticsearch widget
1, elasticsearch-head is a elasticsearch cluster management tool: ./ elasticsearch-1.7.1 / bin / plugin -install mobz / elasticsearch-head
2, elasticsearch-sql: use SQL syntax of the query elasticsearch: ./ bin / plugin -u https://github.com/NLPchina/elasticsearch-sql/releases/download/1.3.5/elasticsearch-sql-1.3.5.zip --install sql
github address: https: //github.com/NLPchina/elasticsearch-sql
3, elasticsearch-bigdesk elasticsearch is a cluster monitoring tool, you can use it to view a variety of status ES cluster.
Installation: ./ bin / plugin -install lukas-vlcek / bigdesk
Visit: http: //192.103.101.203: 9200 / _plugin / bigdesk /,
4, elasticsearch-servicewrapper plug is ElasticSearch service plugin,
In https://github.com/elasticsearch/elasticsearch-servicewrapper download the plug after decompression, the directory service are copied to the bin directory elasticsearch directory.
Then, by executing the following statement to install, start, stop ElasticSearch:
sh elasticsearch install
sh elasticsearch start
sh elasticsearch stop |
|
|
|