Home IT Linux Windows Database Network Programming Server Mobile  
           
  Home \ Programming \ Python is not C     - Use HttpClient remote interface testing (Programming)

- Learning and Practice (Linux)

- Android components save state series - Activity (Programming)

- Android thread mechanism --AsyncTask (Programming)

- Varnish configuration language VCL and its built-in variables (Server)

- Linux package management operations Basic entry (Linux)

- CentOS 5.5 kernel upgrade installation iftop (Linux)

- Linux Log (Linux)

- Linux AS4 VPN server in conjunction with a firewall perfect (Linux)

- Linux installation JDK1.6 rpm.bin assembly (Linux)

- How to query the role of Linux services (Linux)

- Protect against network attacks using Linux system firewall (Linux)

- command-line tool for send e-mail (Linux)

- To build PHP environment (Nginx + MariaDB + PHP7) under CentOS 6.5 (Server)

- Django how to generate content in non-HTML formats (Programming)

- Nginx load balancing configuration (TCP proxy) (Server)

- Linux pwd command learning experience (Linux)

- Expand an existing RAID arrays and remove the failed disk in a RAID (Linux)

- Linux IPTables anti-DDOS attack Shell Scripting (Linux)

- MySQL dual master configuration (Database)

 
         
  Python is not C
     
  Add Date : 2018-11-21      
         
       
         
  I've been using Python, use it to process a variety of data science projects. Python to ease famous. There are coding experience to learn a few days will be able to use (or use it effectively).

It sounds great, but if you only use Python, but also in other languages, such as C, then perhaps there will be some problems.

To give you an example of my own experience of it. I have a good command languages, such as C and C ++. Of ancient classical languages such as Lisp and Prolog can skillfully use. In addition, I also used Java, Javascript and PHP for some time. (So, learning) Python for me is not very simple? In fact, just it looks easy, I dug a hole for himself: I like to use the same C with Python.

Specifically, please look down.

On a recent project, the need to deal with geospatial data. Given (task) is gps tracking about 25,000 location points, needs a given latitude and longitude, repositioning the shortest distance point. My first reaction was that a search (already implemented) computing code fragment known distance between two points of latitude and longitude. Code can John D. Cook wrote this code available in the public domain in the find.

! As long as everything is ready to write a Python function that returns the shortest distance with the input coordinate point index (25,000 points array index), everything will be fine:

def closest_distance (lat, lon, trkpts):
    d = 100000.0
    best = -1
    r = trkpts.index
    for i in r:
        lati = trkpts.ix [i, 'Lat']
        loni = trkpts.ix [i, 'Lon']
        md = distance_on_unit_sphere (lat, lon, lati, loni)
        if d> md
            best = i
            d = md
    return best

Wherein, distance_on_unit_sphere is a function of John D. Cook's book, trkpts is an array containing the coordinates gps tracking (in fact, the data frame pandas, note, pandas are python third-party data analysis extension pack).

I used the above function is implemented in C function is basically the same. It traverses (iteration) trkpts array, so far (from the given coordinate position) of the shortest distance point index, save it to a local variable in the best.

So far, the situation is still good, although Python syntax and C there are many differences, but to write the code, and I have not spent too much time.

Write code fast, but very slow to implement. For example, I specify 428 points, named waypoints (waypoints, waypoint, route navigation key points). Navigation, I find the shortest distance to waypoint point for each waypoint. To 428 waypoints waypoint to find the shortest distance from the point of the program, in my notebook ran 3 minutes and 6 seconds.

After that, I changed the query to calculate the Manhattan distance, which is an approximation. I do not calculate the exact distance between two points, but the calculation of distance east-west axis and north-south axis distance. Calculated Manhattan distance function as follows:

def manhattan_distance (lat1, lon1, lat2, lon2):
    lat = (lat1 + lat2) /2.0
    return abs (lat1-lat2) + abs (math.cos (math.radians (lat)) * (lon1-lon2))

In fact, I used a simpler function, ignoring a factor, that the gap dimension curve 1 degree longitude gap than 1 degree curve is much greater. Simplify function is as follows:

def manhattan_distance1 (lat1, lon1, lat2, lon2):
    return abs (lat1-lat2) + abs (lon1-lon2)

    function closest amended as follows:

def closest_manhattan_distance1 (lat, lon, trkpts):
    d = 100000.0
    best = -1
    r = trkpts.index
    for i in r:
        lati = trkpts.ix [i, 'Lat']
        loni = trkpts.ix [i, 'Lon']
        md = manhattan_distance1 (lat, lon, lati, loni)
        if d> md
            best = i
            d = md
    return best

If you change the function body Manhattan_distance come faster speed can also:

def closest_manhattan_distance2 (lat, lon, trkpts):
    d = 100000.0
    best = -1
    r = trkpts.index
    for i in r:
        lati = trkpts.ix [i, 'Lat']
        loni = trkpts.ix [i, 'Lon']
        md = abs (lat-lati) + abs (lon-loni)
        if d> md
            best = i
            d = md
    return best

On the shortest distance point calculation, use this function with the same John's function effect. I hope that my intuition was right. The simpler the faster. Now this procedure with 2 minutes 37 seconds. Speed by 18%. Good, but not enough exciting.

I decided to use the proper Python. This means that you want to take advantage of pandas support array operation. These arithmetic operations from numpy array package. By calling these array operation, code more concise:

def closest (lat, lon, trkpts):
    cl = numpy.abs (trkpts.Lat - lat) + numpy.abs (trkpts.Lon - lon)
    return cl.idxmin ()

This function returns the same result as the previous function. In my notebook run time it took 0.5 seconds. Full 300 times faster! 300 times ,, that is 30,000%. Incredible. Speed is the reason numpy array arithmetic operations using C. Therefore, we will combine the best of both sides: we get C speed and simplicity of Python.

The lesson is clear: do not use the C way to write Python code. With numpy array operations, do not traverse an array. For me, this is a change in thinking.

Update on July 2, 2015. This paper discusses the Hacker News. Some commentators did not notice (missed) I used the situation pandas data frame. Mainly because it is very commonly used in the data analysis. If I just want to quickly query the shortest distance between the point and I am full time, I can use C or C ++ quadtree (to achieve).

Second update on July 2, 2015. There are also comments mentioned numba code speed. I tried it.

This is my approach, and not necessarily the same in your case. First, note that the results of different python installation version, not necessarily the same experiment. My test environment is installed on windows system Anaconda, also installed some expansion pack. There may be interference between these packages and numba. .

First, enter the following command to install, install numba:

$ Conda install numba
This is the feedback I have a command line interface:

After I found out, numba already exist in the anaconda installation kit. Installation instructions may also have to change eventually.

Recommended numba usage:

@jit
def closest_func (lat, lon, trkpts, func):
    d = 100000.0
    best = -1
    r = trkpts.index
    for i in r:
        lati = trkpts.ix [i, 'Lat']
        loni = trkpts.ix [i, 'Lon']
        md = abs (lat - lati) + abs (lon - loni)
        if d> md:
            #print d, dlat, dlon, lati, loni
            best = i
            d = md
    return best

I did not find time to improve run. I also tried a more aggressive compilation parameter settings:

@jit (nopython = True)
def closest_func (lat, lon, trkpts, func):
    d = 100000.0
    best = -1
    r = trkpts.index
    for i in r:
        lati = trkpts.ix [i, 'Lat']
        loni = trkpts.ix [i, 'Lon']
        md = abs (lat - lati) + abs (lon - loni)
        if d> md:
            #print d, dlat, dlon, lati, loni
            best = i
            d = md
    return best

When this code is run, an error

It seems, pandas smarter than numba handling code.

Of course, I can take the time to modify the data structure, the numba correctly compiled (compile). But why should I do that? With numpy to write code that runs fast enough. Anyway, I have been using numpy and pandas. Why not continue to use it?

I have suggested that I use pypy. It certainly makes sense, but ... I use Jupyter notebooks on the hosting server (note, online browser python interactive development environment). I use it provides python core, that is, the official (regular) Python 2.7.x kernel. It does not provide Pypy choice.

Also suggested Cython. Well, if I go back to compile the code, I simply implement in C and C ++ just fine. I use python, because it offers based notebooks (Note: The Web version of the online development environment) of interactive features, you can achieve rapid prototyping. This is not Cython design goals.
     
         
       
         
  More:      
 
- Ubuntu derivative version of the user and how to install SmartGit / HG 6.0.0 (Linux)
- To establish a secure and reliable Linux operating system (Linux)
- How to determine whether the Linux server was hacked (Linux)
- Getting Started with Linux system to learn: how to install the kernel headers on Linux (Linux)
- Locale files under Ubuntu (Linux)
- Linux daemon (Linux)
- Spacewalk remove packages install the update (Linux)
- Gentoo: !!! existing preserved libs problem (Linux)
- MySQL 5.7 perfectly distributed transaction support (Database)
- Linux Operating System Security Study (Linux)
- To generate a certificate using OpenSSL under Linux (Server)
- Linux system security configuration (Linux)
- Use apt-p2p up a local Debian package cache (Server)
- Linux system performance monitoring with Nmon (Linux)
- Java, boolean operators & =, | = ^ = use (Programming)
- Installation of Ubuntu Make under Ubuntu 15.10 (Linux)
- Nginx logging client ip (Server)
- Docker command Detailed (Linux)
- Linux system installation and usage instructions Wetty (Linux)
- Linux kernel log --dmesg (Linux)
     
           
     
  CopyRight 2002-2016 newfreesoft.com, All Rights Reserved.