|
HttpClient package is an implementation of the HTTP protocol client programming toolkit, to skilled master it, you must be familiar with the HTTP protocol. One of the most simple call as follows:
import java.io.IOException;
import org.apache.http.HttpResponse;
import org.apache.http.client.ClientProtocolException;
import org.apache.http.client.HttpClient;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.client.methods.HttpUriRequest;
import org.apache.http.impl.client.DefaultHttpClient;
public class Test {
public static void main (String [] args) {
// Core application class
HttpClient httpClient = new DefaultHttpClient ();
// HTTP request
HttpUriRequest request =
new HttpGet ( "http: //localhost/index.html");
// Print request information
System.out.println (request.getRequestLine ());
try {
// Send the request, returns a response
HttpResponse response = httpClient.execute (request);
// Print the response information
System.out.println (response.getStatusLine ());
} Catch (ClientProtocolException e) {
// Protocol error
e.printStackTrace ();
} Catch (IOException e) {
// network anomaly
e.printStackTrace ();
}
}
}
If the HTTP server is normal and there is a corresponding service, regular meeting on the results printed out two lines:
GET http: //localhost/index.html HTTP / 1.1
HTTP / 1.1 200 OK
Call the core object httpClient very intuitive, which execute the method passed a request object, and returns a response object. When using httpClient HTTP request, the system may throw two kinds of exceptions, are ClientProtocolException and IOException. The first occurrence of an abnormality is usually protocol errors, such as when the object is constructed HttpGet incoming protocol does not (for example, do not care to "http" written "htp"), or returned by the server does not comply with the HTTP protocol requirements; the second exception is usually due to network anomaly causes, such as the HTTP server does not start and so on.
From a practical point of view, HTTP protocol consists of two parts: HTTP request and HTTP response. So HttpClient package is how HTTP client application it? Implementation process need to pay attention to what the problem?
HTTP request
HTTP 1.1 request by the following composition: GET, HEAD, POST, PUT, DELETE, TRACE and OPTIONS, packages with HttpGet, HttpHead, HttpPost, HttpPut, HttpDelete, HttpTrace, and HttpOptions request to create several classes respectively. All of these classes are to achieve a HttpUriRequest interfaces, it can be used as an execution parameter execute.
All requests are the most commonly used two request GET and POST, GET requests and created the same way, you can create a POST request with the following methods:
HttpUriRequest request = new HttpPost (
"Http: //localhost/index.html");
HTTP request format to tell us that there are two locations or two ways to provide parameters for request: request-line mode and request-body approach.
request-line
request-line manner is provided in the request line arguments directly through the URI.
(1)
We can provide arguments when creating an object request URI, such as:
HttpUriRequest request = new HttpGet (
"Http: //localhost/index.html param1 = value1 & param2 = value2?");
(2)
In addition, HttpClient package provides us URIUtils tools that can generate URI arguments through it, such as:
URI uri = URIUtils.createURI ( "http", "localhost", -1, "/index.html",
"Param1 = value1 & param2 = value2", null);
HttpUriRequest request = new HttpGet (uri);
System.out.println (request.getURI ());
Print the results of the example is as follows:
http: //localhost/index.html param1 = value1 & param2 = value2?
(3)
Note that, if the parameter contains Chinese, need to be URLEncoding process parameters, such as:
String param = "param1 =" + URLEncoder.encode ( "China", "UTF-8") + "& param2 = value2";
URI uri = URIUtils.createURI ( "http", "localhost", 8080,
"/sshsky/index.html", Param, null);
System.out.println (uri);
Print the results of the example is as follows:
http: //localhost/index.html param1 =% E4% B8% AD% E5% 9B% BD & param2 = value2?
(4)
For URLEncoding processing parameters, HttpClient package prepared for other tools for us: URLEncodedUtils. Through it, we can intuitively (but more complex) generated URI, such as:
List params = new ArrayList ();
params.add (new BasicNameValuePair ( "param1", "China"));
params.add (new BasicNameValuePair ( "param2", "value2"));
String param = URLEncodedUtils.format (params, "UTF-8");
URI uri = URIUtils.createURI ( "http", "localhost", 8080,
"/sshsky/index.html", Param, null);
System.out.println (uri);
Print the results of the example is as follows:
http: //localhost/index.html param1 =% E4% B8% AD% E5% 9B% BD & param2 = value2?
request-body
And request-line in different ways, request-body approach is provided in the request-body parameter in this way can only be used for POST requests. There are two classes to complete the work by HttpClient package, they are UrlEncodedFormEntity MultipartEntity class and class. These two classes are to achieve a HttpEntity interface.
(1)
The most used UrlEncodedFormEntity class. It can simulate the traditional HTML form POST request parameter transmission through the object created from that class. As in the following form:
< Form action = "http: //localhost/index.html" method = "POST">
< Input type = "text" name = "param1" value = "China" />
< Input type = "text" name = "param2" value = "value2" />
< Inupt type = "submit" value = "submit" />
< / Form>
We can use the following code:
List formParams = new ArrayList ();
formParams.add (new BasicNameValuePair ( "param1", "China"));
formParams.add (new BasicNameValuePair ( "param2", "value2"));
HttpEntity entity = new UrlEncodedFormEntity (formParams, "UTF-8");
HttpPost request = new HttpPost ( "http: //localhost/index.html");
request.setEntity (entity);
Of course, if you want to view HTTP data format can be obtained by various methods HttpEntity object. Such as:
List formParams = new ArrayList ();
formParams.add (new BasicNameValuePair ( "param1", "China"));
formParams.add (new BasicNameValuePair ( "param2", "value2"));
UrlEncodedFormEntity entity = new UrlEncodedFormEntity (formParams, "UTF-8");
System.out.println (entity.getContentType ());
System.out.println (entity.getContentLength ());
System.out.println (EntityUtils.getContentCharSet (entity));
System.out.println (EntityUtils.toString (entity));
Print the results of the example is as follows:
Content-Type: application / x-www-form-urlencoded; charset = UTF-8
39
UTF-8
param1 =% E4% B8% AD% E5% 9B% BD & param2 = value2
(2)
In addition to traditional application / x-www-form-urlencoded form, we are another frequently used to upload files using a form, the form of this type of multipart / form-data. In HttpClient program expansion pack (HttpMime) specifically has a corresponding class that MultipartEntity class. Such HttpEntity also implements the interface. As in the following form:
< Form action = "http: //localhost/index.html" method = "POST"
enctype = "multipart / form-data">
< Input type = "text" name = "param1" value = "China" />
< Input type = "text" name = "param2" value = "value2" />
< Input type = "file" name = "param3" />
< Inupt type = "submit" value = "submit" />
< / Form>
We can use the following code:
MultipartEntity entity = new MultipartEntity ();
entity.addPart ( "param1", new StringBody ( "China", Charset.forName ( "UTF-8")));
entity.addPart ( "param2", new StringBody ( "value2", Charset.forName ( "UTF-8")));
entity.addPart ( "param3", new FileBody (new File ( "C: \\ 1.txt")));
HttpPost request = new HttpPost ( "http: //localhost/index.html");
request.setEntity (entity);
HTTP response
HttpClient package for the processing of HTTP responses than HTTP request is much simpler, the process also uses HttpEntity interface. We can remove the data stream (InputStream) from HttpEntity object, the data stream response data is returned by the server. Note that, HttpClient package does not responsible for parsing the data stream content. Such as:
HttpUriRequest request = ...;
HttpResponse response = httpClient.execute (request);
// Remove the object from the response in HttpEntity
HttpEntity entity = response.getEntity ();
// Check the indicators entity
System.out.println (entity.getContentType ());
System.out.println (entity.getContentLength ());
System.out.println (EntityUtils.getContentCharSet (entity));
// Remove the server returns the data stream
InputStream stream = entity.getContent ();
// In any way manipulate the data stream stream
// Called slightly
Remarks:
This article describes the HttpClient 4.0.1, the package (including the dependent package) consists of the following JAR packages:
commons-logging-1.1.1.jar
commons-codec-1.4.jar
httpcore-4.0.1.jar
httpclient-4.0.1.jar
apache-mime4j-0.6.jar
httpmime-4.0.1.jar
JAR can download the complete package here.
Apache now has released: HttpCore 4.0-beta3, HttpClient 4.0-beta1.
You can go here to download the source code: http: //hc.apache.org/downloads.cgi
In addition, we also need apache-mime4j-0.5.jar package.
Here to write a simple POST method, Chinese little information in English is not very good.
package test;
import java.util.ArrayList;
import java.util.List;
import org.apache.http.Header;
import org.apache.http.HttpEntity;
import org.apache.http.HttpResponse;
import org.apache.http.NameValuePair;
import org.apache.http.client.entity.UrlEncodedFormEntity;
import org.apache.http.client.methods.HttpPost;
import org.apache.http.client.params.CookiePolicy;
import org.apache.http.client.params.ClientPNames;
import org.apache.http.impl.client.DefaultHttpClient;
import org.apache.http.message.BasicNameValuePair;
import org.apache.http.protocol.HTTP;
import org.apache.http.util.EntityUtils;
public class Test2 {
public static void main (String [] args) throws Exception {
DefaultHttpClient httpclient = new DefaultHttpClient (); // instantiate a HttpClient
HttpResponse response = null;
HttpEntity entity = null;
httpclient.getParams (). setParameter (
ClientPNames.COOKIE_POLICY, CookiePolicy.BROWSER_COMPATIBILITY); // set the cookie compatibility
HttpPost httpost = new HttpPost ( "http://127.0.0.1:8080/pub/jsp/getInfo"); // quotes parameters are: servlet address
List < NameValuePair> nvps = new ArrayList < NameValuePair> ();
nvps.add (new BasicNameValuePair ( "jqm", "fb1f7cbdaf2bf0a9cb5d43736492640e0c4c0cd0232da9de"));
// BasicNameValuePair ( "name", "value"), name the post method in the property, value is the value of the parameter passed
nvps.add (new BasicNameValuePair ( "sqm", "1bb5b5b45915c8"));
httpost.setEntity (new UrlEncodedFormEntity (nvps, HTTP.UTF_8)); // parameter passed post method
response = httpclient.execute (httpost); // execution
entity = response.getEntity (); // returns the server response
try {
System.out.println ( "----------------------------------------");
System.out.println (response.getStatusLine ()); // server returns status
Header [] headers = response.getAllHeaders (); HTTP headers // return
for (int i = 0; i < headers.length; i ++) {
System.out.println (headers [i]);
}
System.out.println ( "----------------------------------------");
String responseString = null;
if (response.getEntity ()! = null) {
responseString = EntityUtils.toString (response.getEntity ()); / / server response returns the HTML code
System.out.println (responseString); // print out the HTML code for a server response
}
} Finally {
if (entity! = null)
entity.consumeContent (); // release connection gracefully
}
System.out.println ( "Login form get:" + response.getStatusLine ());
if (entity! = null) {
entity.consumeContent ();
}
}
}
HttpClient4.0 learning examples - Get page
HttpClient 4.0 out soon, so the tutorial examples related to the above small network, search httpclient was mostly based on the original Commons HttpClient 3.1 (legacy) packages, the download page official website: http: //hc.apache.org/downloads. cgi, if you read the instructions to understand httpclient4.0 official website from the original package branched out a separate package, after the original package httpclient not be upgraded, so after we are using httpclient new branch, as with the previous 4.0 3.1 package structure and interface have a larger change, so found online examples are not suitable for most 4.0, of course, we can go through those instances wondering 4.0 usage, I am also a novice, after recording to facilitate the learning process under retrieval
The examples we crawl the page to get the code, content and other information
By default, the server will be based on the client's request header information back to the server to support encoding like google.cn his own support utf-8, gb2312 encoding, etc., so if you do not specify any header information in the header, then he the default will return gb2312 encoding, and if we visit google.cn directly in the browser by httplook, or firefox firebug plug-in to view the return header information, then he returned to find that UTF-8 encoding
Here we look at examples to explain it, so I also put comments inside the code interpretation, put the complete code, easy for beginners to understand
This example will
httpclient related packages use
httpclient-4.0.jar
httpcore-4.0.1.jar
httpmime-4.0.jar
commons-logging-1.0.4.jar and other related packages
// HttpClientTest.java
package com.baihuo.crawler.test;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import org.apache.http.Header;
import org.apache.http.HttpEntity;
import org.apache.http.HttpHost;
import org.apache.http.HttpResponse;
import org.apache.http.client.HttpClient;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.DefaultHttpClient;
import org.apache.http.util.EntityUtils;
class HttpClientTest {
public final static void main (String [] args) throws Exception {
// Initialization, where the constructor and 3.1 on different
HttpClient httpclient = new DefaultHttpClient ();
HttpHost targetHost = new HttpHost ( "www.google.cn");
// HttpGet httpget = new HttpGet ( "http://www.apache.org/");
HttpGet httpget = new HttpGet ( "/");
// See the default request header information
System.out.println ( "Accept-Charset:" + httpget.getFirstHeader ( "Accept-Charset"));
// The following of this if not to find whatever you set Accept-Charset is gbk or utf-8, he will return to the default gb2312 (this example is for google.cn)
httpget.setHeader ( "User-Agent", "Mozilla / 5.0 (Windows; U; Windows NT 5.1; zh-CN; rv: 1.9.1.2)");
// Separated by commas display can accept a variety of encoding
httpget.setHeader ( "Accept-Language", "zh-cn, zh; q = 0.5");
httpget.setHeader ( "Accept-Charset", "GB2312, utf-8; q = 0.7, *; q = 0.7");
// Header information to verify the settings take effect
System.out.println ( "Accept-Charset:" + httpget.getFirstHeader ( "Accept-Charset") getValue ().);
// Execute HTTP request
System.out.println ( "executing request" + httpget.getURI ());
HttpResponse response = httpclient.execute (targetHost, httpget);
// HttpResponse response = httpclient.execute (httpget);
System.out.println ( "----------------------------------------");
System.out.println ( "Location:" + response.getLastHeader ( "Location"));
System.out.println (response.getStatusLine () getStatusCode ().);
System.out.println (response.getLastHeader ( "Content-Type"));
System.out.println (response.getLastHeader ( "Content-Length"));
System.out.println ( "----------------------------------------");
// Returns a status page judgment to determine whether the steering crawl new link
int statusCode = response.getStatusLine () getStatusCode ().;
if ((statusCode == HttpStatus.SC_MOVED_PERMANENTLY) ||
(StatusCode == HttpStatus.SC_MOVED_TEMPORARILY) ||
(StatusCode == HttpStatus.SC_SEE_OTHER) ||
(StatusCode == HttpStatus.SC_TEMPORARY_REDIRECT)) {
// Here redirect processing has not been verified here
String newUri = response.getLastHeader ( "Location") getValue ().;
httpclient = new DefaultHttpClient ();
httpget = new HttpGet (newUri);
response = httpclient.execute (httpget);
}
// Get hold of the response entity
HttpEntity entity = response.getEntity ();
// See all return header information
Header headers [] = response.getAllHeaders ();
int ii = 0;
while (ii < headers.length) {
System.out.println (headers [ii] .getName () + ":" + headers [ii] .getValue ());
++ Ii;
}
// If the response does not enclose an entity, there is no need
// To bother about connection release
if (entity! = null) {
// The source stream is stored in a byte array which, since they may need to use twice the stream,
byte [] bytes = EntityUtils.toByteArray (entity);
String charSet = "";
// If the Content-Type header contains coding information, then we can get in here
charSet = EntityUtils.getContentCharSet (entity);
System.out.println ( "In header:" + charSet);
// If the head is not, then we need to view the page source code, although this method is not entirely correct to say, as some rough who did not write the page encoding information encoded in the page header
if (charSet == "") {
regEx = "(? = < meta) *.? (< = charset = [\\ '| \\\?"]?) ([[az] | [AZ] | [0-9] | -] *) ";
p = Pattern.compile (regEx, Pattern.CASE_INSENSITIVE);
m = p.matcher (new String (bytes)); // default encoding translated into strings, because we are no match for the Chinese, so garbled string may have no effect on us
result = m.find ();
if (m.groupCount () == 1) {
charSet = m.group (1);
} Else {
charSet = "";
}
}
System.out.println ( "Last get:" + charSet);
// At this point, we can encode the original byte array designed in accordance with normal output into a string (if found encoding)
System.out.println ( "Encoding string is:" + new String (bytes, charSet));
}
httpclient.getConnectionManager () shutdown ().;
}
} |
|
|
|