Home IT Linux Windows Database Network Programming Server Mobile  
           
  Home \ Programming \ Python is not C     - How Bluetooth turned off by default in Ubuntu 14.04 (Linux)

- Observation network performance tools for Linux (Linux)

- Linux resource restriction level summary (Linux)

- Teach you how to ensure password security under the Linux operating system (Linux)

- Btrfs file system creation and their characteristics in Linux (Linux)

- Fedora 22 users to install the VLC media player (Linux)

- CentOS7 Kubernetes used on container management (Server)

- Android gets the global process information and the memory used by the process (Programming)

- Struts2 interceptor simulation (Programming)

- Linux system security configuration (Linux)

- How to Install Node.js in CentOS 7 (Linux)

- Oracle database online redo logs are several methods of recovery of deleted (Database)

- Spring next ActiveMQ combat (Programming)

- MySQL and MariaDB traditional master-slave cluster configuration (Database)

- Protect against network attacks using Linux system firewall (Linux)

- Calling Qt libraries to implement functional processes of some summary (Programming)

- Use HttpClient remote interface testing (Programming)

- Linux Powerful command Awk Introduction (Linux)

- How to use the ps command to monitor progress in the implementation of Linux commands (Linux)

- Introduction to Linux Shell (Programming)

 
         
  Python is not C
     
  Add Date : 2018-11-21      
         
       
         
  I've been using Python, use it to process a variety of data science projects. Python to ease famous. There are coding experience to learn a few days will be able to use (or use it effectively).

It sounds great, but if you only use Python, but also in other languages, such as C, then perhaps there will be some problems.

To give you an example of my own experience of it. I have a good command languages, such as C and C ++. Of ancient classical languages such as Lisp and Prolog can skillfully use. In addition, I also used Java, Javascript and PHP for some time. (So, learning) Python for me is not very simple? In fact, just it looks easy, I dug a hole for himself: I like to use the same C with Python.

Specifically, please look down.

On a recent project, the need to deal with geospatial data. Given (task) is gps tracking about 25,000 location points, needs a given latitude and longitude, repositioning the shortest distance point. My first reaction was that a search (already implemented) computing code fragment known distance between two points of latitude and longitude. Code can John D. Cook wrote this code available in the public domain in the find.

! As long as everything is ready to write a Python function that returns the shortest distance with the input coordinate point index (25,000 points array index), everything will be fine:

def closest_distance (lat, lon, trkpts):
    d = 100000.0
    best = -1
    r = trkpts.index
    for i in r:
        lati = trkpts.ix [i, 'Lat']
        loni = trkpts.ix [i, 'Lon']
        md = distance_on_unit_sphere (lat, lon, lati, loni)
        if d> md
            best = i
            d = md
    return best

Wherein, distance_on_unit_sphere is a function of John D. Cook's book, trkpts is an array containing the coordinates gps tracking (in fact, the data frame pandas, note, pandas are python third-party data analysis extension pack).

I used the above function is implemented in C function is basically the same. It traverses (iteration) trkpts array, so far (from the given coordinate position) of the shortest distance point index, save it to a local variable in the best.

So far, the situation is still good, although Python syntax and C there are many differences, but to write the code, and I have not spent too much time.

Write code fast, but very slow to implement. For example, I specify 428 points, named waypoints (waypoints, waypoint, route navigation key points). Navigation, I find the shortest distance to waypoint point for each waypoint. To 428 waypoints waypoint to find the shortest distance from the point of the program, in my notebook ran 3 minutes and 6 seconds.

After that, I changed the query to calculate the Manhattan distance, which is an approximation. I do not calculate the exact distance between two points, but the calculation of distance east-west axis and north-south axis distance. Calculated Manhattan distance function as follows:

def manhattan_distance (lat1, lon1, lat2, lon2):
    lat = (lat1 + lat2) /2.0
    return abs (lat1-lat2) + abs (math.cos (math.radians (lat)) * (lon1-lon2))

In fact, I used a simpler function, ignoring a factor, that the gap dimension curve 1 degree longitude gap than 1 degree curve is much greater. Simplify function is as follows:

def manhattan_distance1 (lat1, lon1, lat2, lon2):
    return abs (lat1-lat2) + abs (lon1-lon2)

    function closest amended as follows:

def closest_manhattan_distance1 (lat, lon, trkpts):
    d = 100000.0
    best = -1
    r = trkpts.index
    for i in r:
        lati = trkpts.ix [i, 'Lat']
        loni = trkpts.ix [i, 'Lon']
        md = manhattan_distance1 (lat, lon, lati, loni)
        if d> md
            best = i
            d = md
    return best

If you change the function body Manhattan_distance come faster speed can also:

def closest_manhattan_distance2 (lat, lon, trkpts):
    d = 100000.0
    best = -1
    r = trkpts.index
    for i in r:
        lati = trkpts.ix [i, 'Lat']
        loni = trkpts.ix [i, 'Lon']
        md = abs (lat-lati) + abs (lon-loni)
        if d> md
            best = i
            d = md
    return best

On the shortest distance point calculation, use this function with the same John's function effect. I hope that my intuition was right. The simpler the faster. Now this procedure with 2 minutes 37 seconds. Speed by 18%. Good, but not enough exciting.

I decided to use the proper Python. This means that you want to take advantage of pandas support array operation. These arithmetic operations from numpy array package. By calling these array operation, code more concise:

def closest (lat, lon, trkpts):
    cl = numpy.abs (trkpts.Lat - lat) + numpy.abs (trkpts.Lon - lon)
    return cl.idxmin ()

This function returns the same result as the previous function. In my notebook run time it took 0.5 seconds. Full 300 times faster! 300 times ,, that is 30,000%. Incredible. Speed is the reason numpy array arithmetic operations using C. Therefore, we will combine the best of both sides: we get C speed and simplicity of Python.

The lesson is clear: do not use the C way to write Python code. With numpy array operations, do not traverse an array. For me, this is a change in thinking.

Update on July 2, 2015. This paper discusses the Hacker News. Some commentators did not notice (missed) I used the situation pandas data frame. Mainly because it is very commonly used in the data analysis. If I just want to quickly query the shortest distance between the point and I am full time, I can use C or C ++ quadtree (to achieve).

Second update on July 2, 2015. There are also comments mentioned numba code speed. I tried it.

This is my approach, and not necessarily the same in your case. First, note that the results of different python installation version, not necessarily the same experiment. My test environment is installed on windows system Anaconda, also installed some expansion pack. There may be interference between these packages and numba. .

First, enter the following command to install, install numba:

$ Conda install numba
This is the feedback I have a command line interface:

After I found out, numba already exist in the anaconda installation kit. Installation instructions may also have to change eventually.

Recommended numba usage:

@jit
def closest_func (lat, lon, trkpts, func):
    d = 100000.0
    best = -1
    r = trkpts.index
    for i in r:
        lati = trkpts.ix [i, 'Lat']
        loni = trkpts.ix [i, 'Lon']
        md = abs (lat - lati) + abs (lon - loni)
        if d> md:
            #print d, dlat, dlon, lati, loni
            best = i
            d = md
    return best

I did not find time to improve run. I also tried a more aggressive compilation parameter settings:

@jit (nopython = True)
def closest_func (lat, lon, trkpts, func):
    d = 100000.0
    best = -1
    r = trkpts.index
    for i in r:
        lati = trkpts.ix [i, 'Lat']
        loni = trkpts.ix [i, 'Lon']
        md = abs (lat - lati) + abs (lon - loni)
        if d> md:
            #print d, dlat, dlon, lati, loni
            best = i
            d = md
    return best

When this code is run, an error

It seems, pandas smarter than numba handling code.

Of course, I can take the time to modify the data structure, the numba correctly compiled (compile). But why should I do that? With numpy to write code that runs fast enough. Anyway, I have been using numpy and pandas. Why not continue to use it?

I have suggested that I use pypy. It certainly makes sense, but ... I use Jupyter notebooks on the hosting server (note, online browser python interactive development environment). I use it provides python core, that is, the official (regular) Python 2.7.x kernel. It does not provide Pypy choice.

Also suggested Cython. Well, if I go back to compile the code, I simply implement in C and C ++ just fine. I use python, because it offers based notebooks (Note: The Web version of the online development environment) of interactive features, you can achieve rapid prototyping. This is not Cython design goals.
     
         
       
         
  More:      
 
- CentOS use wget (Linux)
- xCAT line installation on CentOS 6.X (Linux)
- Ubuntu 15.04 and Ubuntu 14.04 installed Cinnamon 2.6 (Linux)
- Ubuntu install Liferea news subscription software (Linux)
- Based Corosync + Pacemaker + DRBD + LNMP Web server to achieve high availability cluster (Server)
- Elasticsearch 2.20 Beginners: aggregation (Server)
- Zabbix monitors the status of TCP connections (Server)
- Android to determine whether the device to open WIFI, GPRS data connection (Programming)
- Python closure and function objects (Programming)
- Use the TC flow control test under Linux (Linux)
- How to back up Debian system backupninja (Linux)
- The principle Httpclient4.4 (HttpClient Interface) (Programming)
- Ubuntu Gitolite management Git Server code base permissions (Server)
- Experts teach you how to identify the actual functional differences between the firewall (Linux)
- CentOS 6.5 makes the LAN http source (Linux)
- Java integrated development environment common set of operations (Linux)
- How to remove the Linux memory Cache, Buffer and swap space (Linux)
- The maximum subsequence algorithm and optimization problems (Programming)
- CentOS iptables firewall enabled (Linux)
- The script Linux command (Linux)
     
           
     
  CopyRight 2002-2016 newfreesoft.com, All Rights Reserved.