Introduction

Non-line-of-sight (NLoS) imaging techniques have important applications in the fields of autonomous vehicle navigation and remote sensing1. NLoS techniques aim to localize, track, and image targets hidden from view by recording ’multiply-bounced’ reflected waves, i.e. waves that reflect off a directly visible surface, such as a wall, towards the hidden target, and back from it to a detector array by another reflection. In the last decade, there have been great advancements in the field, enabling high-resolution NLoS imaging and tracking in real-time for a variety of applications using both light and sound1,2,3,4,5,6,7,8.

In the optical domain, time-of-flight (ToF) techniques, achieve centimeter-scale lateral resolution by computational back-projection reconstruction3,4,5,6,7. However, since in the optical domain, the reflections from most common surfaces are diffuse reflections, due to the surface roughness being large compared to the optical wavelength, the quartic falloff of the multi-bounce diffuse reflections fundamentally limits the imaging range. In addition, many real-life applications, such as in automotive and indoor tracking of subjects, do not require the centimeter-scale resolution achievable via optical NLoS techniques, making acoustic-based NLoS techniques attractive.

When acoustic waves are considered8, the optically-rough surfaces of e.g. white-painted walls, become effectively flat reflective mirrors due to the considerably longer acoustic wavelength (\(\lambda \approx 1\) m–10 cm for acoustic frequencies of 300 Hz–3 KHz). The specular reflections of audible-frequency waves from most ordinary walls can then straightforwardly reveal the mirror image of the hidden targets by conventional beam-forming back-projection techniques8, similar to the ones used in ultrasound echography. Furthermore, in the acoustic domain, the direct measurement of the acoustic fields is performed using conventional off-the-shelf microphones and does not require specialized ultrafast detectors or interferometric techniques, as used in the optical domain.

Acoustic NLoS localization of active sources, such as speakers, has been long demonstrated using either reflected waves9,10, or waves refracted by a cornered edge of an occluder11. Recently, Lindell et al. have demonstrated NLoS localization and imaging of passive reflectors in an anechoic chamber by applying a multi-bounce ToF approach, utilizing an array of microphones and speakers emitting strong chirped pulses8. Specifically, the pulsed emissions from each of the speakers and consecutive measurements of the reflected waves by the microphones array have allowed the retrieval of a set of speaker-microphone Green functions. These were then used to reconstruct the hidden scene by beam-forming back-projection.

Here, we study the possibility of retrieving the same set of temporal Green functions passively, i.e. without emitting controlled acoustic waveforms. To achieve this, we leverage the ideas of passive imaging12,13,14,15,16,17,18,19,20 to estimate the Green functions from cross-correlations of ambient broadband noise, using only an array of microphones. We demonstrate localization of a human subject around the corner in a reverberating concrete-walled room containing several uncontrolled broadband noise sources. In our experiments, random diffuse signals reveal pulse-echo-like reflected signals via temporal cross-correlations between pairs of microphones in the array, which are then used as the estimates of the Green functions to faithfully estimate the hidden targets positions.

Our work is based on passive correlation imaging, also known as coda-interferometry in seismology13, and which is utilized in underwater acoustics for ocean tomography21,22,23. The working principle of coda-interferometry (or ’acoustic daylight imaging’ as termed in underwater acoustics23 ) is that by cross-correlating recordings of ambient noise one can reproduce the Green function, which contains the same ToF information measured in active pulse-echo experiments. The idea was first put to use in helioseismology for extracting the travel time of acoustic waves from temporal cross-correlations of the intensity fluctuations on the solar surface14. Lobkis and Weaver have shown that the autocorrelation function of ultrasound noise measurements reveals the same waveform as the one measured in a single transducer pulse-echo experiment15 and that the cross-correlation between two registrations of the diffuse noise field at two arbitrary points in space can reveal the Green’s function between these points16. The approach was also put to use in geophysics17, microwave18, and in optical studies of complex media19. It is important to note that in underwater acoustics, the term acoustic daylight imaging is used to describe both a correlations-based coda-interferometry approach that retrieves the Green-function between pairs of detectors21,22,23, and both an approach that mimics optical incoherent imaging, without Green function retrieval24. Importantly, the Green function retrieval-based approach that we utilize in this work has the advantage of using the extracted ToF information for localization. As passive correlation allows to acquire the same ToF information as obtained in active pulse-echo experiments, it could be used, in principle, to localize hidden targets in an NLoS scenario in the same fashion as conventional ToF measurements2,8. Thus, one can utilize uncontrolled broadband noise sources for passive NLoS imaging of reflective targets, in a similar fashion to the use in direct passive imaging20. This is the goal we were set to demonstrate in this work.

Results

The principle of our approach and the setup for realizing it are depicted in Fig. 1a, accompanied by a numerically simulated sample result (Fig. 1b–h, see “Methods”). We consider a simplified scenario, where a hidden target is outside the line of sight for both a microphone array and a broadband uncontrolled noise source (Fig. 1a). A broadband acoustic noise field emitted by the noise source is reflected off the target either by reflection from the relay wall (iii, depicted by a magenta dashed line in Fig. 1a) or by diffraction from the occluding wall edge (ii, depicted in cyan in Fig. 1a). A detector array composed of N microphones records these reflected fields, in addition to reflections from the walls in the scene (e.g. (i) depicted in green), and the direct arriving waves from the noise source.

The waveforms \(v_j(t)\) \(j=1\ldots N\), recorded at the different detectors are given in Fig. 1b. While seemingly random, the cross-correlation, \(C_{ij}(\tau )\) between each pair ij of the recorded waveforms reveals pulse-echo-like ToF information (Fig. 1c):

$$\begin{aligned} C_{ij}(\tau ) = \frac{1}{T_{avg}} \int _{0}^{T_{avg}} v_i(t)v_j(t+\tau )dt \end{aligned}$$
(1)

Where \(T_{avg}\) is the recording (averaging) time, and \(\tau\) is the variable computed lag time between the two waveforms. This simple post-processing provides an estimate of the Green function between the two detectors. The longer is \(T_{avg}\) the better is the estimate25. Since the cross-correlated data is approximately equivalent to a measurement of a pulsed source and detector pair16, it can be beam-formed back to form an image by conventional delay and sum beamforming26,27 (Fig. 1d), assuming that the reflecting ’relay wall’ is a flat mirror, which is a good approximation for most common indoor walls. The presence of multiple reflections that do not originate from the target result in strong reconstructed features that are not related to the target (Fig. 1d), but originate from the static walls in the scene. These contributions can be subtracted using an additional identical measurement performed without the target present in the scene (Fig. 1e,f), where only the contributions of the walls are present (a background measurement). Taking the difference between the cross-correlation of the measurements with and without a target leaves only the target-related signals (Fig. 1g). Beam-forming using these signals allows localizing the position of the target mirror-image (Fig. 1h). A reconstruction artefact originating from early-arriving signals appears in the beam-formed image (marked by a cyan arrow in Fig. 1h). This artefact originates from signals that diffract off the cornered edge of the barrier rather than the relay wall in either the detection or sonification paths (Fig. 1g (ii, cyan arrow)). A more detailed analysis of this diffraction artefact is given below (Fig. 3).

Figure 2 presents experimental results of passive acoustic localization around the corner. A photo of the experimental setup is given in Fig. 2a: A human subject is hidden around the corner from a linear array of \(N=16\) microphones that record the acoustic fields from two uncontrolled broadband sources (Fig. 2c). The broadband spectrum of the raw measured signal of a single microphone is given in Fig. 2b (source - blue curve). We calculate the pair-wise cross-correlations between the measured signals after band-pass filtering the raw recorded signals with a Gaussian filter of central frequency \(f_0 = 5.3\) kHz and a full width at half max (FWHM) bandwidth of \(\Delta f_{FWHM} = 1.8\) kHz. Repeating the cross-correlations calculation for signals acquired with and without the subject present, and taking their difference reveals a pulse-echo-like ToF information with a peak at the expected delay time (Fig. 2c). Applying delay-and-sum beamforming on the \(N^2\) cross-correlations traces, and flipping the reconstructed (mirror) image vertically with respect to the relay wall, localizes faithfully the subject’s position in several locations by analyzing different 80 s-long temporal segments of a single recording (Fig. 2d, true positions marked by cyan crosses). Using shorter recorded segments of \(T_{avg} = 2\) s still reveals the correct positions of the hidden target, with more artefacts present (Fig. 2e). Numerical simulation of the simplified experimental scene, without the presence of noise and additional reflections that are outside the shown field of view, shows good qualitative agreement with the experimental reconstructions (Fig. 2f). In order to study the effect of the locations of the uncontrolled noise sources on the reconstruction fidelity, we have performed several numerical simulations with various locations of uncorrelated sources. The results of these simulations are presented in Supplementary Fig. S1.

To provide more in-depth analysis and understanding of the origins of the diffraction artefact present in Fig. 1g,h, we display in Fig. 3 four snapshots of a simulated propagated impulse field from one noise source. The simulated results have been obtained by a two-dimensional FDTD simulation (k-Wave28, see “Methods”): In Fig. 3a, the free-space propagation results in a perfect spherical wavefront. When the pulse front hits the walls (Fig. 3b) it is reflected from the relay-wall (green arrow, i) and the occluding barrier. Shortly after (Fig. 3c) two phenomena can be observed: The first is the propagation of the reflected wave from the relay wall (green arrow, i), and the second is the weak, but non-negligible, ’knife-edge’ diffraction from the edge of the occluding barrier (cyan arrow, ii). Finally, at later times (Fig. 3d), while the wave reflected from the relay wall continues to propagate towards the target (magenta arrow, iii), the weak knife-edge diffracted wave already arrives to the target (cyan arrow). The contribution from both of these signals will be eventually recorded by the detectors. While the diffracted peak arrives at an earlier time (cyan arrow in Fig. 1c,g) than the signal reflected from the relay wall (magenta arrow in Fig. 1c,g), only the latter will yield the correct position of the target when conventional beam-forming is used for reconstruction. Nonetheless, knowledge of the visible scene geometry can be used to take into account the contribution of such knife-edge diffraction signals to improve the reconstruction. Removing undesired artifacts and improving the SNR in the reconstructed image, can be achieved by diffraction and reflections aware localization29.

Figure 1
figure 1

Passive NLoS localization process using uncontrolled noise sources (numerical example). (a) The simulated scene (top view): a target is hidden behind an occluder. A 16-detectors array records the continuous broadband noise emitted by a nearby uncontrolled source, which reverberates in the scene. The recorded noise contains directly arriving signals, single reflections (in green, i), diffracted reflections (cyan, ii), and multiple reflections (magenta, iii) allowing NLoS localization. (b) Noise fields \(v_1(t), v_2(t)\) recorded by detectors 1,2, respectively. (c) Cross-correlation of the recorded fields \(C_{12}(t)\), reveals pulse-echo-like ToF information containing: (i) direct reflections from the wall; (ii) fields that originate from diffraction by the occluder edge to the target; (iii) fields that reflect by the wall to the target and back. These are used for direct localization of the target mirror image. (d) Delay-and-sum beam-forming reconstruction from 16 \(\times\) 16 cross-correlations (as in c) for all detector pairs. The positions of the wall (green arrow), the target mirror image (magenta arrow), as well as the edge diffraction artifact (cyan arrow) are visible. (e,f) Same as (c,d), for a scene without the hidden target. (g) Difference between the cross-correlations of (c) and (e). (h) Difference between (d) and (f) shows only the hidden target contributions. The figure was created using MATLAB R2022a and INKSCAPE 1.2.

Figure 2
figure 2

Experimental passive acoustic NLoS localization and tracking of a hidden subject around-the-corner. (a) Setup (top view): A subject hides behind an occluder. Two uncorrelated speakers emit broadband random noise. A linear array of \(N=16\) microphones records the acoustic pressure fields. (b) Power spectral density (PSD) of the raw measured signal in microphone number 1 (source - blue curve), the bandpass-filtered signal used for reconstructions (black curve), and the ambient noise when the sources are off (red curve). (c) Difference in cross-correlations of a single pair of microphones when the target is present and when the target is absent. The arrow marks the desired double-reflection (wall-target-wall) that provides the target position. (d,e) Experimental results: beamforming reconstructions from experimental cross-correlations locating a person at 3 different positions around the corner. Integration times: \(T_{Avg} = 80\) s (d), and \(T_{Avg} = 2\) s (e). A cyan cross marks the true positions. The reconstructions are mirrored with respect to the wall. (f) Numerical results of simulated scenes without reverberations or measurement noise, \(T_{avg} = 0.08\) s. The figure was created using MATLAB R2022a and INKSCAPE 1.2.

Discussion

To summarize, we have demonstrated an approach that allows to localize and track a person hidden around a corner using conventional off-the-shelf microphones and uncontrolled broadband noise sources. The presented NLoS acoustic imaging approach offers improved covertness over previous acoustic-based approaches8,30 by two important differences: the first is the use of broadband random emissions rather than pulsed emissions, similar to the use in chaotic-waveform SONAR31. The second, and most important difference, is in the fact that, unlike chaotic-waveform SONAR, our correlation-based approach does not require the knowledge of the spatial positions and exact emitted waveforms of the sources. Our approach is in essence the utilization of correlation-based ’acoustic daylight imaging’21,22,23 for NLoS imaging. In this respect, it is important to note that the term acoustic daylight imaging is also used to refer to a passive imaging technique that does not rely on retrieval of the Green-function from cross-correlations, but rather utilizes spatio-temporal correlations through interference in an acoustic analog to incoherent optical imaging24.

In our Green-function correlations-based approach, the spatial localization accuracy is dictated by the ToF temporal resolution, which is given by the temporal width of the cross-correlation peak. For a broadband source, this width is given by the source coherence time \(t_c \approx 1/\Delta f\), where \(\Delta f\) is the source spectral bandwidth. Each single ToF measurement from temporal cross-correlation between two detectors localizes the target on an ellipsoid surface (or a sphere in the case of the autocorrelation of a single detector) with an axial resolution of \(dr \approx c_s/2\Delta f\). Where \(c_s\) is the speed of sound. Assuming a perfect retrieval of the Green functions, the final reconstruction resolution is the same as for active SONAR experiments8. In practice, the finite recording time will result in noisy cross-correlations and thus to reconstruction clutter artefacts (Fig. 2).

Our method is based on Green function retrieval from temporal cross-correlations of broadband noise. In most works the noise field is assumed to be diffuse and isotropic15, which may be indeed the case for strongly reverberant rooms. In the case of an anisotropic noise field, e.g. where the waves traveling in the medium are arriving mainly from a one-sided half plane, the Green function retrieval would result in a one-sided projection of either \(G(x_i,x_j,t)\), or \(G(x_i,x_j,-t)\)32. In our experiments, the field is not entirely diffuse, and we have noticed differences in the reconstructions depending on the exact placement of the non-isotropic noise sources (see also Supplementary Fig. S1).

The two main challenges in making the presented approach useful in practical scenarios are the relatively narrow bandwidth of common ambient noise (Fig. 2b, red curve), which results in a lower reconstruction resolution, and the current requirement for a relatively long averaging time. The averaging time can be lowered by using a larger number of detectors, and adapting advanced reconstruction approaches. Development of more advanced reconstruction algorithms that take into account the contributions of diffracted waves using the (known or measured) room geometry is expected to significantly improve the reconstruction fidelity. Similar data-driven approaches using neural networks have been recently put forward for optical NLoS reconstruction33,34, for NLoS classification of individuals35 and for suppressing interfering echoes in NLoS echolocation30. Alternatively, it was found in the microwave regime, that the reverberation creates an interferometric sensitivity enabling sub-wavelength resolution.36

Figure 3
figure 3

Numerical study of the wave propagation in the considered scene reveals the various contributions in the measured signals. (ad) Acoustic pressure distribution of the propagating wave from a short pulsed source (blue x), at four different propagation times. (a) Free-space spherical wave propagation before reaching any reflectors/occluders. (b) First reflections from the wall (green arrow) and occluder. (c) At a later time, the reflection from the wall (green arrow, i) propagates towards the target. Diffraction of the direct wave from the occluder edge generates a weak diffracted wave propagating towards the target (cyan arrow, ii). (d) The edge-diffracted wave hits the target (cyan arrow, ii). The wavefront reflected from the wall arrives both directly at the detector array (green arrow, i), and at a later time to the target (magenta arrow, iii). The figure was created using MATLAB R2022a and INKSCAPE 1.2.

Methods

Experimental setup

The experimental setup is presented in Fig. 2a. The occluder was realized by a pair of acoustic drywall plates with two layers of Suprema—Tecsound pallet sandwiched between them. This 3 cm thick occluder was placed perpendicularly to the wall at a distance of 45 cm. Noise was generated by playing two different Gaussian random white noises through two audio speakers (MIYAKO Ltd, SL-800). The microphone array consisted of 16 condenser microphones (BOYA, BY-M1) placed at a spacing of 4 cm, and were sampled simultaneously at 40 kHz with 16-bit depth using a multichannel DAQ device (National Instruments, PXIe-6363). The array was placed at a distance of 53 cm from the wall, in parallel to it, and the rightmost microphone was at a distance of 5 cm from the occluder. A human subject served as the target in all experiments. The figures were created using MATLAB V. R2022a (https://www.mathworks.com/) and INKSCAPE V. 1.2 (https://inkscape.org/).

Numerical simulations

Simulations were performed using ’k-Wave’, a 2D Finite-Difference Time-Domain (FDTD) simulation toolbox28. The simulations computed the propagation of a delta-like impulse pressure wave from each of the noise sources through the simulated scene to each of the microphones (Fig. 1a), yielding the Green functions from each source to each microphone. The full simulated scene was represented by \(400\times 400\) pixels, with a pixel size of 1 \(\mathrm {cm^2}\) representing a plane of \(\mathrm {4 \;m\times 4\;m}\). Free-space propagation through air was represented by a speed-of-sound of \(\mathrm {345\;m/s}\) and density of \(\mathrm {1.225\;kg/m^3}\). The wall and occluder were represented by a 1.47 m and 3 cm thick simulated regions having a density of \(\mathrm {24.5\;kg/m^3}\), and speed of sound of \(\mathrm {1500\;m/s}\), which yielded a high value of reflection coefficient and low transmission. The random noise sources were simulated by convolving the Green functions related to each source with a single random signal with a length of \(7501\times 10^3\) samples. The two random signals obtained for each microphone (from each of the two noise sources) were then summed, cropped to a finite measurement time, and were considered as the signal measured by this microphone. These ’measured’ signals were then processed in the same manner as the measured experimental signals (Fig. 2b).