If you were Registered and logged in, you could reply and use other advanced thread options
|
Posted by on September 6, 2007, 3:01 pm
Suppose a robot was constructed which could traverse the interior of a
building, and in doing so build an associative memory map of the
visual layout of a building -- using some kind of neural net. So with
the map built, it could tell, given a picture of anywhere in the
house, where it (the robot) is currently located (relatively) in the
house (i.e. it receives as input a picture of a table (say the kitchen
table), and from that it would know that to the left of that is a
door, and on the other side of that door is the living room (assuming
it somehow knew what portion of its map was designated to be the
living room).
Is this even remotely possible?
If so, then would it be feasible to, in some input format, tell the
robot, for example, to "go to room x" (assuming somehow the robot had
learned to associate "room x" with a particular portion of its
internal map)?
|
|
Posted by Curt Welch on September 6, 2007, 3:40 pm
chad.d.johnson@gmail.com wrote:
> Suppose a robot was constructed which could traverse the interior of a
> building, and in doing so build an associative memory map of the
> visual layout of a building -- using some kind of neural net. So with
> the map built, it could tell, given a picture of anywhere in the
> house, where it (the robot) is currently located (relatively) in the
> house (i.e. it receives as input a picture of a table (say the kitchen
> table), and from that it would know that to the left of that is a
> door, and on the other side of that door is the living room (assuming
> it somehow knew what portion of its map was designated to be the
> living room).
> Is this even remotely possible?
Yes. Humans can do it, so it's possible without a doubt. :) It's not easy
however.
> If so, then would it be feasible to, in some input format, tell the
> robot, for example, to "go to room x" (assuming somehow the robot had
> learned to associate "room x" with a particular portion of its
> internal map)?
Sure.
I'm not aware of anyone that has done just what you are thinking of, but
there's been a lot of work on related ideas.
I know I've seen robotics mapping projects where the goal of the robot was
to create a 2D map of its environment as it moved around a building, and to
use that map to locate itself in the environment. But as I recall, the
sensor supplying the data was not visual but instead something more easily
usable like laser distance measurements in 360 degrees around the robot.
This would give it distance measurements to the walls. It would then build
up a map of the walls and doors. The project I'm thinking of made heavy
use of statistical techniques. Actually, now that I'm thinking about it,
it might have been given an simple 2D map of the building, and it's goal
was to move around and to try and figure out where is was on the map. I
think it worked by estimating the probability of it's location at all
points on the map (down to some resolution), and using the sensor data to
update the probability of it being at each location on the map until it
received enough data to estimate it's location to a high probability. I
recall seeing a video of the computer screen which represented it's best
guess as to it's current location on the map as it moved around. It
started off not knowing where it was and then quickly reduced it to be at a
few possible areas and then refined that to it's actual location.
Trying to do the same sort of thing from visual data alone would be much
harder. I'm not aware of any project which has done that.
Once it has a map, and the ability to locate itself on the map, then simply
pointing to different locations on the map is a simple way to tell it where
you want it to go. Typing English like words such as "go to the kitchen"
however would be a bit more complex depending on how flexible you wanted
the command language to be. If for example, you wanted to talk to it tell
it things like, "this is the kitchen", and then later, be able to tell it,
"go get me a beer from the fridge in the kitchen" then you are at a very
different level of problem than using a mouse to point to a location on
it's map to tell it where to go.
--
Curt Welch http://CurtWelch.Com/
curt@kcwc.com http://NewsReader.Com/
|
|
Posted by Chad Johnson on September 6, 2007, 4:22 pm
On Sep 6, 2:40 pm, c...@kcwc.com (Curt Welch) wrote:
> chad.d.john...@gmail.com wrote:
> > Suppose a robot was constructed which could traverse the interior of a
> > building, and in doing so build an associative memory map of the
> > visual layout of a building -- using some kind of neural net. So with
> > the map built, it could tell, given a picture of anywhere in the
> > house, where it (the robot) is currently located (relatively) in the
> > house (i.e. it receives as input a picture of a table (say the kitchen
> > table), and from that it would know that to the left of that is a
> > door, and on the other side of that door is the living room (assuming
> > it somehow knew what portion of its map was designated to be the
> > living room).
> > Is this even remotely possible?
> Yes. Humans can do it, so it's possible without a doubt. :) It's not easy
> however.
> > If so, then would it be feasible to, in some input format, tell the
> > robot, for example, to "go to room x" (assuming somehow the robot had
> > learned to associate "room x" with a particular portion of its
> > internal map)?
> Sure.
> I'm not aware of anyone that has done just what you are thinking of, but
> there's been a lot of work on related ideas.
> I know I've seen robotics mapping projects where the goal of the robot was
> to create a 2D map of its environment as it moved around a building, and to
> use that map to locate itself in the environment. But as I recall, the
> sensor supplying the data was not visual but instead something more easily
> usable like laser distance measurements in 360 degrees around the robot.
> This would give it distance measurements to the walls. It would then build
> up a map of the walls and doors. The project I'm thinking of made heavy
> use of statistical techniques. Actually, now that I'm thinking about it,
> it might have been given an simple 2D map of the building, and it's goal
> was to move around and to try and figure out where is was on the map. I
> think it worked by estimating the probability of it's location at all
> points on the map (down to some resolution), and using the sensor data to
> update the probability of it being at each location on the map until it
> received enough data to estimate it's location to a high probability. I
> recall seeing a video of the computer screen which represented it's best
> guess as to it's current location on the map as it moved around. It
> started off not knowing where it was and then quickly reduced it to be at a
> few possible areas and then refined that to it's actual location.
Cool. Was it able to navigate pretty well?
> Trying to do the same sort of thing from visual data alone would be much
> harder. I'm not aware of any project which has done that.
Yea, I think using image data alone would not work well. I think the
robot would need to be able to have some form of radar and be able to
ping its surrounding environment in order to understand how far away
objects are. It would need to then be able to (somehow) in the neural
net associate distance data with actual objects and specific positions
in the image map it constructs.
> Once it has a map, and the ability to locate itself on the map, then simply
> pointing to different locations on the map is a simple way to tell it where
> you want it to go. Typing English like words such as "go to the kitchen"
> however would be a bit more complex depending on how flexible you wanted
> the command language to be. If for example, you wanted to talk to it tell
> it things like, "this is the kitchen", and then later, be able to tell it,
> "go get me a beer from the fridge in the kitchen" then you are at a very
> different level of problem than using a mouse to point to a location on
> it's map to tell it where to go.
I guess if I were doing this with a human I would point and say, "That
is the refrigerator," "That is the kitchen table," "This is the living
room couch." So maybe I could have some UI where that I could type
into a textbox the object name, click a button when the robot starts
scanning that object/area, and when it's done I click a button to stop
the phrase/object association processes.
Is my head too far in the clouds? Is something like this needed in the
robotics field (would it be useful)? Is there a better way than what
I've described? I'd like to do it as closely as possible to how humans
perform navigation and understand their positions (generally speaking).
|
|
Posted by Curt Welch on September 6, 2007, 5:20 pm
> On Sep 6, 2:40 pm, c...@kcwc.com (Curt Welch) wrote:
> > chad.d.john...@gmail.com wrote:
> > > Suppose a robot was constructed which could traverse the interior of
> > > a building, and in doing so build an associative memory map of the
> > > visual layout of a building -- using some kind of neural net. So with
> > > the map built, it could tell, given a picture of anywhere in the
> > > house, where it (the robot) is currently located (relatively) in the
> > > house (i.e. it receives as input a picture of a table (say the
> > > kitchen table), and from that it would know that to the left of that
> > > is a door, and on the other side of that door is the living room
> > > (assuming it somehow knew what portion of its map was designated to
> > > be the living room).
> > > Is this even remotely possible?
> > Yes. Humans can do it, so it's possible without a doubt. :) It's not
> > easy however.
> > > If so, then would it be feasible to, in some input format, tell the
> > > robot, for example, to "go to room x" (assuming somehow the robot had
> > > learned to associate "room x" with a particular portion of its
> > > internal map)?
> > Sure.
> > I'm not aware of anyone that has done just what you are thinking of,
> > but there's been a lot of work on related ideas.
> > I know I've seen robotics mapping projects where the goal of the robot
> > was to create a 2D map of its environment as it moved around a
> > building, and to use that map to locate itself in the environment. But
> > as I recall, the sensor supplying the data was not visual but instead
> > something more easily usable like laser distance measurements in 360
> > degrees around the robot. This would give it distance measurements to
> > the walls. It would then build up a map of the walls and doors. The
> > project I'm thinking of made heavy use of statistical techniques.
> > Actually, now that I'm thinking about it, it might have been given an
> > simple 2D map of the building, and it's goal was to move around and to
> > try and figure out where is was on the map. I think it worked by
> > estimating the probability of it's location at all points on the map
> > (down to some resolution), and using the sensor data to update the
> > probability of it being at each location on the map until it received
> > enough data to estimate it's location to a high probability. I recall
> > seeing a video of the computer screen which represented it's best guess
> > as to it's current location on the map as it moved around. It started
> > off not knowing where it was and then quickly reduced it to be at a few
> > possible areas and then refined that to it's actual location.
> Cool. Was it able to navigate pretty well?
> > Trying to do the same sort of thing from visual data alone would be
> > much harder. I'm not aware of any project which has done that.
> Yea, I think using image data alone would not work well. I think the
> robot would need to be able to have some form of radar and be able to
> ping its surrounding environment in order to understand how far away
> objects are. It would need to then be able to (somehow) in the neural
> net associate distance data with actual objects and specific positions
> in the image map it constructs.
> > Once it has a map, and the ability to locate itself on the map, then
> > simply pointing to different locations on the map is a simple way to
> > tell it where you want it to go. Typing English like words such as "go
> > to the kitchen" however would be a bit more complex depending on how
> > flexible you wanted the command language to be. If for example, you
> > wanted to talk to it tell it things like, "this is the kitchen", and
> > then later, be able to tell it, "go get me a beer from the fridge in
> > the kitchen" then you are at a very different level of problem than
> > using a mouse to point to a location on it's map to tell it where to
> > go.
> I guess if I were doing this with a human I would point and say, "That
> is the refrigerator," "That is the kitchen table," "This is the living
> room couch." So maybe I could have some UI where that I could type
> into a textbox the object name, click a button when the robot starts
> scanning that object/area, and when it's done I click a button to stop
> the phrase/object association processes.
> Is my head too far in the clouds? Is something like this needed in the
> robotics field (would it be useful)?
It would be useful if it worked well. I've never seen such a thing that
worked well enough to be useful however.
> Is there a better way than what
> I've described? I'd like to do it as closely as possible to how humans
> perform navigation and understand their positions (generally speaking).
No one really knows how humans do these things. At least not well enough
to be able to build a machine to duplicate our skills in these tasks.
If you try to train the robot by example, you have to give it a lot of
examples for it to work. For example, if you point at the fridge and tell
it, that's the refrigerator, what's going to stop it from thinking that all
large white areas are to be known as "the refrigerator"? Or what if
there's a magnet on the fridge and the robot makes the assumption the
magnet is the fridge? So when you say go to the fridge, the robot runs
over to the white-board which also has magnets on it. Before we can
understand such a message, we need to create some parsing of the
environment into objects and have a large base of commonsense experience
about what the person is most likely to be trying to communicate to us.
The basic concept of association is very simple and powerful and a
fundamental part of what makes humans intelligent. But the hard part of
the problem is understanding how to decode raw sensor data into "things"
that the associations can be made with. I don't know of any AI projects
that has really solved that part of the problem.
When we, as humans, look at a kitchen, we don't just see raw 2D pixel data.
We see a 3D room full of 3D objects. Before you can use basic high level
associations like telling the robot the big white box is a refrigerator,
the robot first has to decode that raw 2D data into a description of a 3D
room full of objects so that frig you understand, is the one the robot
already understands before you try to give it a name.
But how do we, as humans, learn to see the kitchen in this way? Some
people believe a lot of that ability in us is the result of millions of
years of evolution building custom hardware in us that decodes the visual
data for us in that way (and they believe that we have different hardware
for decoding visual data, than for decoding sound data). So they believe
each part of the brain simply has complex hardware for processing each type
of data and performing each function. So to duplicate that in an AI
project, we need to develop a lot of different complex modules and make
them all work together.
I happen to believe that most of that work is done by one generic type of
hardware which is able to adapt to, and decode whatever type of data you
send to it - like a ANN can learn to decode any data it's been trained to
decode.
There are lot of people looking at these sorts of problems from all
different directions, but there simply are no solutions that match what
humans can do.
--
Curt Welch http://CurtWelch.Com/
curt@kcwc.com http://NewsReader.Com/
|
|
Posted by Chad Johnson on September 6, 2007, 10:33 pm
> If you try to train the robot by example, you have to give it a lot of
> examples for it to work. For example, if you point at the fridge and tell
> it, that's the refrigerator, what's going to stop it from thinking that all
> large white areas are to be known as "the refrigerator"? Or what if
> there's a magnet on the fridge and the robot makes the assumption the
> magnet is the fridge? So when you say go to the fridge, the robot runs
> over to the white-board which also has magnets on it. Before we can
> understand such a message, we need to create some parsing of the
> environment into objects and have a large base of commonsense experience
> about what the person is most likely to be trying to communicate to us.
> The basic concept of association is very simple and powerful and a
> fundamental part of what makes humans intelligent. But the hard part of
> the problem is understanding how to decode raw sensor data into "things"
> that the associations can be made with. I don't know of any AI projects
> that has really solved that part of the problem.
> When we, as humans, look at a kitchen, we don't just see raw 2D pixel data.
> We see a 3D room full of 3D objects. Before you can use basic high level
> associations like telling the robot the big white box is a refrigerator,
> the robot first has to decode that raw 2D data into a description of a 3D
> room full of objects so that frig you understand, is the one the robot
> already understands before you try to give it a name.
Isn't being able to distinguish objects only going to be an issue if
an objects' location changes or if the object is moving? What is the
disadvantage if an object is identified based on the image and radar
readings of it and its surrounding environment? It's less human-like,
yes, but would the robot not still be able to locate the object?
On a different note, about the distinguishing objects from one another
and the environment: in my room I have a bookcase, and to the right I
have two computers stacked on top of one another. How do I know that
the computers are not part of the bookshelf -- that I have 3 separate
objects? Some things I notice are (and I am sort of just thinking to
myself here):
* The bookshelf area is colored differently than the computer area
* The bookshelf does not have buttons or lights or internal shapes
like the bookshelf does; the computer has dinstinctly different
features than the bookshelf area does.
Now, how I know there are *two* computers in the computer area of the
image data rather than just one? The computers look different, but
suppose they looked exactly the same and were aligned perfectly so
that they looked like one object. I would have a more difficult time
realizing that there are two computers in that area and not just one.
It would take me a little longer to make this determination, and I
think there is the chance that I may not realize that there actually
are two objects. So to make this determination, I think the biggest
factor would be that I would realize that I am seeing the same or
similar thing twice, and with the number two in my mind, I would
likely poll any existing knowledge about computer cases, and, assuming
I had any, I would likely determine that the height of the area is too
tall to be just one computer. So basically I'm doing a size comparison
against my existing computer casing-related knowledge and determining
whether any cases I've seen have ever been that tall. If the results
are around 50/50, I may inspect the cases closer (e.g. try separating
the two repeated areas).
It seems that distinguishing whether an area in an image is one or
multiple objects involves closeness matching. I very much bet this
could be done with some algorithm. Maybe one that could separate the
image data into multiple areas based on various hard-coded (or even
learned) characteristics, such as size, color, shape, shadows,
internal characteristics (shapes, colors) etc. Then other non-image
inputs could be used as complements, such as physical dimensions from
radar.
Any thoughts? :)
|
Page 1 of 3 1 2 3 > last >>
Related Posts
Latest Posts
|
|
> building, and in doing so build an associative memory map of the
> visual layout of a building -- using some kind of neural net. So with
> the map built, it could tell, given a picture of anywhere in the
> house, where it (the robot) is currently located (relatively) in the
> house (i.e. it receives as input a picture of a table (say the kitchen
> table), and from that it would know that to the left of that is a
> door, and on the other side of that door is the living room (assuming
> it somehow knew what portion of its map was designated to be the
> living room).
> Is this even remotely possible?