[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: matching lists
Mark Fardal wrote:
>
> Hi,
>
> I have been looking at properties of particles in a simulation, and
> sometimes need to match up the particles in two different subsets. I
> typically have (quantity A, index #) for one set of particles, and
> (quantity B, index #) for another set, and want to compare quantities
> A and B for the particles that are in both sets.
>
> As of late last night I could not think of a good way to do this;
> WHERE inside a for-loop would be very slow. Maybe I'm missing
> something easy, but in any case here's a solution inspired by the
> recently submitted SETINTERSECTION function. Hope somebody finds
> it useful.
>
The standard where_array, as posted a few years back, and modified slightly for
the case of the null intersection, is attached. It will work with floating
point and other data types also. It works by inflating the vectors input to 2-d
and testing for equality in one go. It will also handle the case of repeated
entries.
For sparse integer data sets, such as the SSN example you give in the routine
documentation, it will be vastly superior. Let's take your SSN for an example.
If collected from random geographic locations and age brackets, the range of
integers might span a billion. Suppose you had 1000 entries in your survey.
The histograms will each return a vector of length 1-billion (almost all
zeroes), whereas two 1-million entry arrays would be made using where_array...
the difference between uncomfortable and easy.
In the other extreme, in which the data are well grouped (not sparse) and you
have a lot of them (e.g. a set of particles labelled from 1-1e6), you will be
better off with histogram... and now that I think about it, why can't we
formulate it like this:
n_hist=maxab-minab
n_arr= n_elements(a) * n_elements(b)
if n_hist lt n_arr then
; use set_intersection method
else
; use array inflation method
endif
to get the best of both worlds. Hmm... maybe if I'm bored someday. By the way,
you could fix your method to deal with repeated entries by dealing more
carefully with the reverse indices, to collect every index in
reva[reva[r[i]]:reva[r[i]+1]-1] for all i. I'll leave it as a challenge to do
that without a loop over i ;)
JD
--
J.D. Smith |*| WORK: (607) 255-5842
Cornell University Dept. of Astronomy |*| (607) 255-6263
304 Space Sciences Bldg. |*| FAX: (607) 255-5875
Ithaca, NY 14853 |*|
;+
; NAME:
; WHERE_ARRAY
;
; PURPOSE:
; Return the indices of those elements in vector B which
; exist in vector A. Basically a WHERE(B IN A)
; where B and A are 1 dimensional arrays.
;
; CATEGORY:
; Array
;
; CALLING SEQUENCE:
; result = WHERE_ARRAY(A,B)
;
; INPUTS:
; A vector that might contains elements of vector B
; B vector that we would like to know which of its
; elements exist in A
;
; OPTIONAL INPUTS:
;
; KEYWORD PARAMETERS:
; iA_in_B return instead the indices of A that are in
; (exist) in B
;
; OUTPUTS:
; Index into B of elements found in vector A. If no
; matches are found -1 is returned. If the function is called
; with incorrect arguments, a warning is displayed, and -2 is
; returned (see SIDE EFFECTS for more info)
;
; OPTIONAL OUTPUTS:
;
; COMMON BLOCKS:
; None
;
; SIDE EFFECTS:
; If the function is called incorrectly, a message is displayed
; to the screen, and the !ERR_STRING is set to the warning
; message. No error code is set, because the program returns
; -2 already
;
; RESTRICTIONS:
; This should be used with only Vectors. Matrices other then
; vectors will result in -2 being returned. Also, A and B must
; be defined, and must not be strings!
;
; PROCEDURE:
;
; EXAMPLE:
; IDL> A=[2,1,3,5,3,8,2,5]
; IDL> B=[3,4,2,8,7,8]
; IDL> result = where_array(a,b)
; IDL> print,result
; 0 0 2 2 3 5
; SEE ALSO:
; where
;
; MODIFICATION HISTORY:
; Written by: Dan Carr at RSI (command line version) 2/6/94
; Stephen Strebel 3/6/94
; made into a function, but really DAN did all
; the thinking on this one!
; Stephen Strebel 6/6/94
; Changed method, because died with Strings (etc)
; Used ideas from Dave Landers. Fast TOO!
; Strebel 30/7/94
; fixed checking structure check
; Smith, JD 9/1/98
; Minor Tweak to case of no overlapping members
;-
FUNCTION where_array,A,B,IA_IN_B=iA_in_B
; Check for: correct number of parameters
; that A and B have each only 1 dimension
; that A and B are defined
if (n_params() ne 2 or (size(A))(0) ne 1 or (size(B))(0) ne 1 $
or n_elements(A) eq 0 or n_elements(B) eq 0) then begin
message,'Inproper parameters',/Continue
message,'Usage: result = where_array(A,B,[IA_IN_B=ia_in_b]',/Continue
return,-2
endif
;parameters exist, let's make sure they are not structures
if ((size(A))((size(A))(0)+1) eq 8 or $
(size(B))((size(B))(0)+1) eq 8) then begin
message,'Inproper parametrs',/Continue
message,'Parameters cannot be of type Structure',/Continue
return,-2
endif
; build two matrices to compare
Na = n_elements(a)
Nb = n_elements(b)
l = lindgen(Na,Nb)
AA = A(l mod Na)
BB = B(l / Na)
;compare the two matrices we just created
I = where(AA eq BB,cnt)
if cnt eq 0 then return,-1
; normally (without keyword), return index of B that exist in A
if keyword_set(iA_in_B) then return, I mod Na
return,I/Na
END