Home PC Games Linux Windows Database Network Programming Server Mobile  
           
  Home \ Programming \ Python is not C     - NAT (network address translation) Realization (Linux)

- Different between Linux file path and the windows (Linux)

- About phpwind 5.01-5.3 0day analysis of the article (Linux)

- Ubuntu Gitolite management Git Server code base permissions (Server)

- Linux kernel boot to retain large memory method summary (Linux)

- Linux system monitoring, top command of the diagnostic tool Detailed (Linux)

- Python script file directory traversal examples (Programming)

- The most concise explanation of JavaScript closures (Programming)

- Ubuntu Backup and Recovery (Linux)

- VMware virtual machines to install virt-manager unable to connect to libvirt's approach (Linux)

- Repair Maven project developed default Maven Plugin folder (Linux)

- ARM platform compiler installation Golang (Linux)

- To build a private Docker registry (Server)

- Linux Security (Linux)

- Ubuntu and Derivative Edition users install LMMS 0.4.15 (Linux)

- MySQL Basic tutorial: About varchar (N) (Database)

- Through the source code to install MySQL 5.6.26 under CentOS6 (Database)

- Using Linux strace command trace / debug a program commonly used options (Linux)

- Recovery from MySQL master data consistency summary (Database)

- Verify the character set on MyCAT (Database)

 
         
  Python is not C
     
  Add Date : 2018-11-21      
         
         
         
  I've been using Python, use it to process a variety of data science projects. Python to ease famous. There are coding experience to learn a few days will be able to use (or use it effectively).

It sounds great, but if you only use Python, but also in other languages, such as C, then perhaps there will be some problems.

To give you an example of my own experience of it. I have a good command languages, such as C and C ++. Of ancient classical languages such as Lisp and Prolog can skillfully use. In addition, I also used Java, Javascript and PHP for some time. (So, learning) Python for me is not very simple? In fact, just it looks easy, I dug a hole for himself: I like to use the same C with Python.

Specifically, please look down.

On a recent project, the need to deal with geospatial data. Given (task) is gps tracking about 25,000 location points, needs a given latitude and longitude, repositioning the shortest distance point. My first reaction was that a search (already implemented) computing code fragment known distance between two points of latitude and longitude. Code can John D. Cook wrote this code available in the public domain in the find.

! As long as everything is ready to write a Python function that returns the shortest distance with the input coordinate point index (25,000 points array index), everything will be fine:

def closest_distance (lat, lon, trkpts):
    d = 100000.0
    best = -1
    r = trkpts.index
    for i in r:
        lati = trkpts.ix [i, 'Lat']
        loni = trkpts.ix [i, 'Lon']
        md = distance_on_unit_sphere (lat, lon, lati, loni)
        if d> md
            best = i
            d = md
    return best

Wherein, distance_on_unit_sphere is a function of John D. Cook's book, trkpts is an array containing the coordinates gps tracking (in fact, the data frame pandas, note, pandas are python third-party data analysis extension pack).

I used the above function is implemented in C function is basically the same. It traverses (iteration) trkpts array, so far (from the given coordinate position) of the shortest distance point index, save it to a local variable in the best.

So far, the situation is still good, although Python syntax and C there are many differences, but to write the code, and I have not spent too much time.

Write code fast, but very slow to implement. For example, I specify 428 points, named waypoints (waypoints, waypoint, route navigation key points). Navigation, I find the shortest distance to waypoint point for each waypoint. To 428 waypoints waypoint to find the shortest distance from the point of the program, in my notebook ran 3 minutes and 6 seconds.

After that, I changed the query to calculate the Manhattan distance, which is an approximation. I do not calculate the exact distance between two points, but the calculation of distance east-west axis and north-south axis distance. Calculated Manhattan distance function as follows:

def manhattan_distance (lat1, lon1, lat2, lon2):
    lat = (lat1 + lat2) /2.0
    return abs (lat1-lat2) + abs (math.cos (math.radians (lat)) * (lon1-lon2))

In fact, I used a simpler function, ignoring a factor, that the gap dimension curve 1 degree longitude gap than 1 degree curve is much greater. Simplify function is as follows:

def manhattan_distance1 (lat1, lon1, lat2, lon2):
    return abs (lat1-lat2) + abs (lon1-lon2)

    function closest amended as follows:

def closest_manhattan_distance1 (lat, lon, trkpts):
    d = 100000.0
    best = -1
    r = trkpts.index
    for i in r:
        lati = trkpts.ix [i, 'Lat']
        loni = trkpts.ix [i, 'Lon']
        md = manhattan_distance1 (lat, lon, lati, loni)
        if d> md
            best = i
            d = md
    return best

If you change the function body Manhattan_distance come faster speed can also:

def closest_manhattan_distance2 (lat, lon, trkpts):
    d = 100000.0
    best = -1
    r = trkpts.index
    for i in r:
        lati = trkpts.ix [i, 'Lat']
        loni = trkpts.ix [i, 'Lon']
        md = abs (lat-lati) + abs (lon-loni)
        if d> md
            best = i
            d = md
    return best

On the shortest distance point calculation, use this function with the same John's function effect. I hope that my intuition was right. The simpler the faster. Now this procedure with 2 minutes 37 seconds. Speed by 18%. Good, but not enough exciting.

I decided to use the proper Python. This means that you want to take advantage of pandas support array operation. These arithmetic operations from numpy array package. By calling these array operation, code more concise:

def closest (lat, lon, trkpts):
    cl = numpy.abs (trkpts.Lat - lat) + numpy.abs (trkpts.Lon - lon)
    return cl.idxmin ()

This function returns the same result as the previous function. In my notebook run time it took 0.5 seconds. Full 300 times faster! 300 times ,, that is 30,000%. Incredible. Speed is the reason numpy array arithmetic operations using C. Therefore, we will combine the best of both sides: we get C speed and simplicity of Python.

The lesson is clear: do not use the C way to write Python code. With numpy array operations, do not traverse an array. For me, this is a change in thinking.

Update on July 2, 2015. This paper discusses the Hacker News. Some commentators did not notice (missed) I used the situation pandas data frame. Mainly because it is very commonly used in the data analysis. If I just want to quickly query the shortest distance between the point and I am full time, I can use C or C ++ quadtree (to achieve).

Second update on July 2, 2015. There are also comments mentioned numba code speed. I tried it.

This is my approach, and not necessarily the same in your case. First, note that the results of different python installation version, not necessarily the same experiment. My test environment is installed on windows system Anaconda, also installed some expansion pack. There may be interference between these packages and numba. .

First, enter the following command to install, install numba:

$ Conda install numba
This is the feedback I have a command line interface:

After I found out, numba already exist in the anaconda installation kit. Installation instructions may also have to change eventually.

Recommended numba usage:

@jit
def closest_func (lat, lon, trkpts, func):
    d = 100000.0
    best = -1
    r = trkpts.index
    for i in r:
        lati = trkpts.ix [i, 'Lat']
        loni = trkpts.ix [i, 'Lon']
        md = abs (lat - lati) + abs (lon - loni)
        if d> md:
            #print d, dlat, dlon, lati, loni
            best = i
            d = md
    return best

I did not find time to improve run. I also tried a more aggressive compilation parameter settings:

@jit (nopython = True)
def closest_func (lat, lon, trkpts, func):
    d = 100000.0
    best = -1
    r = trkpts.index
    for i in r:
        lati = trkpts.ix [i, 'Lat']
        loni = trkpts.ix [i, 'Lon']
        md = abs (lat - lati) + abs (lon - loni)
        if d> md:
            #print d, dlat, dlon, lati, loni
            best = i
            d = md
    return best

When this code is run, an error

It seems, pandas smarter than numba handling code.

Of course, I can take the time to modify the data structure, the numba correctly compiled (compile). But why should I do that? With numpy to write code that runs fast enough. Anyway, I have been using numpy and pandas. Why not continue to use it?

I have suggested that I use pypy. It certainly makes sense, but ... I use Jupyter notebooks on the hosting server (note, online browser python interactive development environment). I use it provides python core, that is, the official (regular) Python 2.7.x kernel. It does not provide Pypy choice.

Also suggested Cython. Well, if I go back to compile the code, I simply implement in C and C ++ just fine. I use python, because it offers based notebooks (Note: The Web version of the online development environment) of interactive features, you can achieve rapid prototyping. This is not Cython design goals.
     
         
         
         
  More:      
 
- Linux iostat command example explanation (Linux)
- Performance issues under CentOS 6.5 VLAN devices (Linux)
- Oracle procedure or function Empty Table (Database)
- Improve WordPress performance (Server)
- Android Studio Installation and Configuration Guide tutorial (Linux)
- Using Ruby to build a simple HTTP service and sass environment (Server)
- Linux file compression and archiving (Linux)
- CentOS achieve trash mechanism (Linux)
- Java Annotation Comments (Programming)
- The method of installing software under Ubuntu Linux (Linux)
- Singleton (Linux)
- Oracle user lock how to know what causes (Database)
- CentOS6 5 Configure SSH password Free (Linux)
- Seven kinds of NIC binding mode Detail (Linux)
- Docker build private warehouse (Server)
- RM Environment Database RMAN Backup Strategy Formulation (Database)
- Spring inject a type of object to enumerate (Programming)
- Xtrabackup creates a slave node without downtime (Database)
- Git and GitHub use of Eclipse and Android Studio (Programming)
- C ++ pointer two third memory model (Programming)
     
           
     
  CopyRight 2002-2020 newfreesoft.com, All Rights Reserved.